Amazon EMR on EKS will get as much as 19% efficiency enhance operating on AWS Graviton3 Processors vs. Graviton2


Amazon EMR on EKS is a deployment choice that allows you to run Spark workloads on Amazon Elastic Kubernetes Service (Amazon EKS) simply. It means that you can innovate quicker with the newest Apache Spark on Kubernetes structure whereas benefiting from the performance-optimized Spark runtime powered by Amazon EMR. This deployment choice elects Amazon EKS as its underlying compute to orchestrate containerized Spark functions with higher worth efficiency.

AWS regularly innovates to offer selection and higher price-performance for our prospects, and the third-generation Graviton processor is the subsequent step within the journey. Amazon EMR on EKS now helps Amazon Elastic Compute Cloud (Amazon EC2) C7g—the newest AWS Graviton3 occasion household. On a single EKS cluster, we measured EMR runtime for Apache Spark efficiency by evaluating C7g with C6g households throughout chosen occasion sizes of 4XL, 8XL and 12XL. We’re excited to look at a most 19% efficiency achieve over the sixth era C6g Graviton2 cases, which ends up in a 15% price discount.

On this submit, we talk about the efficiency take a look at outcomes that we noticed whereas operating the identical EMR Spark runtime on completely different Graviton-based EC2 occasion sorts.

For some use instances, such because the benchmark take a look at, operating an information pipeline that requires a mixture of CPU sorts for the granular-level price effectivity, or migrating an current utility from Intel to Graviton-based cases, we normally spin up completely different clusters that host separate sorts of processors, akin to x86_64 vs. arm64. Nevertheless, Amazon EMR on EKS has made it simpler. On this submit, we additionally present steering on operating Spark with a number of CPU architectures in a typical EKS cluster, in order that we will save important effort and time on establishing a separate cluster to isolate the workloads.

Infrastructure innovation

AWS Graviton3 is the newest era of AWS-designed Arm-based processors, and C7g is the primary Graviton3 occasion in AWS. The C household is designed for compute-intensive workloads, together with batch processing, distributed analytics, information transformations, log evaluation, and extra. Moreover, C7g cases are the primary within the cloud to characteristic DDR5 reminiscence, which offers 50% larger reminiscence bandwidth in comparison with DDR4 reminiscence, to allow high-speed entry to information in reminiscence. All these improvements are well-suited for giant information workloads, particularly the in-memory processing framework Apache Spark.

The next desk summarizes the technical specs for the examined occasion sorts:

Occasion Title vCPUs Reminiscence (GiB) EBS-Optimized Bandwidth (Gbps) Community Bandwidth (Gbps) On-Demand Hourly Fee
c6g.4xlarge 16 32 4.75 As much as 10 $0.544
c7g.4xlarge 16 32 As much as 10 As much as 15 $0.58
c6g.8xlarge 32 64 9 12 $1.088
c7g.8xlarge 32 64 10 15 $1.16
c6g.12xlarge 48 96 13.5 20 $1.632
c7g.12xlarge 48 96 15 22.5 $1.74

These cases are all constructed on AWS Nitro System, a set of AWS-designed {hardware} and software program improvements. The Nitro System offloads the CPU virtualization, storage, and networking capabilities to devoted {hardware} and software program, delivering efficiency that’s practically indistinguishable from naked metallic. Particularly, C7g cases have included assist for Elastic Material Adapter (EFA), which turns into the usual on this occasion household. It permits our functions to speak immediately with community interface playing cards offering decrease and extra constant latency. Moreover, these are all Amazon EBS-optimized cases, and C7g offers larger devoted bandwidth for EBS volumes, which can lead to higher I/O efficiency contributing to faster learn/write operations in Spark.

Efficiency take a look at outcomes

To quantify efficiency, we ran TPC-DS benchmark queries for Spark with a 3TB scale. These queries are derived from TPC-DS normal SQL scripts, and the take a look at outcomes are usually not akin to different revealed TPC-DS benchmark outcomes. Other than the benchmark requirements, a single Amazon EMR 6.6 Spark runtime (suitable with Apache Spark model 3.2.0) was used as the information processing engine throughout six completely different managed node teams on an EKS cluster: C6g_4, C7g_4,C6g_8, C7g_8, C6g_12, C7g_12. These teams are named after occasion sort to differentiate the underlying compute assets. Every group can mechanically scale between 1 and 30 nodes inside its corresponding occasion sort. Architecting the EKS cluster in such a manner, we will run and evaluate our experiments in parallel, every of which is hosted in a single node group, i.e., an remoted compute atmosphere on a typical EKS cluster. It additionally makes it doable to run an utility with a number of CPU architectures on the only cluster. Take a look at the pattern EKS cluster configuration and benchmark job examples for extra particulars.

We measure the Graviton efficiency and price enhancements utilizing two calculations: complete question runtime and geometric imply of the overall runtime. The next desk exhibits the outcomes for equal sized C6g and C7g cases and the identical Spark configurations.

Benchmark Attributes 12 XL 8 XL 4 XL
Job parallelism (spark.executor.core*spark.executor.cases) 188 cores (4*47) 188 cores (4*47) 188 cores (4*47)
spark.executor.reminiscence 6 GB 6 GB 6 GB
Variety of EC2 cases 5 7 16
EBS quantity 4 * 128 GB io1 disk 4 * 128 GB io1 disk 4 * 128 GB io1 disk
Provisioned IOPS per quantity 6400 6400 6400
Whole question runtime on C6g (sec) 2099 2098 2042
Whole question runtime on C7g (sec) 1728 1738 1660
Whole run time enchancment with C7g 18% 17% 19%
Geometric imply question time on C6g (sec) 9.74 9.88 9.77
Geometric imply question time on C7g (sec) 8.40 8.32 8.08
Geometric imply enchancment with C7g 13.8% 15.8% 17.3%
EMR on EKS reminiscence utilization price on C6g (per run) $0.28 $0.28 $0.28
EMR on EKS vCPU utilization price on C6g (per run) $1.26 $1.25 $1.24
Whole price per benchmark run on C6g (EC2 + EKS cluster + EMR worth) $6.36 $6.02 $6.52
EMR on EKS reminiscence utilization price on C7g (per run) $0.23 $0.23 $0.22
EMR on EKS vCPU utilization price on C7g (per run) $1.04 $1.03 $0.99
Whole price per benchmark run on C7g (EC2 + EKS cluster + EMR worth) $5.49 $5.23 $5.54
Estimated price discount with C7g 13.7% 13.2% 15%

The whole variety of cores and reminiscence are equivalent throughout all benchmarked cases, and 4 provisioned IOPS SSD disks had been connected to every EBS-optimized occasion for the optimum disk I/O efficiency. To permit for comparability, these configurations had been deliberately chosen to match with settings in different EMR on EKS benchmarks. Take a look at the earlier benchmark weblog submit Amazon EMR on Amazon EKS offers as much as 61% decrease prices and as much as 68% efficiency enchancment for Spark workloads for C5 cases primarily based on x86_64 Intel CPU.

The desk signifies C7g cases have constant efficiency enchancment in comparison with equal C6g Graviton2 cases. Our take a look at outcomes confirmed 17–19% enchancment in complete question runtime for chosen occasion sizes, and 13.8–17.3% enchancment in geometric imply. On price, we noticed 13.2–15% price discount on C7g efficiency exams in comparison with C6g whereas operating the 104 TPC-DS benchmark queries.

Knowledge shuffle in a Spark workload

Typically, large information frameworks schedule computation duties for various nodes in parallel to attain optimum efficiency. To proceed with its computation, a node will need to have the outcomes of computations from upstream. This requires shifting intermediate information from a number of servers to the nodes the place information is required, which is termed as shuffling information. In lots of Spark workloads, information shuffle is an inevitable operation, so it performs an essential function in efficiency assessments. This operation could contain a excessive fee of disk I/O, community information transmission, and will burn a major quantity of CPU cycles.

In case your workload is I/O certain or bottlenecked by present information shuffle efficiency, one suggestion is to benchmark on improved {hardware}. General, C7g provides higher EBS and community bandwidth in comparison with equal C6g occasion sorts, which can enable you optimize efficiency. Due to this fact, in the identical benchmark take a look at, we captured the next further data, which is damaged down into per-instance-type community/IO enhancements.

Primarily based on the TPC-DS question take a look at consequence, this graph illustrates the share will increase of knowledge shuffle operations in 4 classes: most disk learn and write, and most community obtained and transmitted. Compared to c6g cases, the disk learn efficiency improved between 25–45%, whereas the disk write efficiency improve was 34–47%. On the community throughput comparability, we noticed a rise of 21–36%.

Run an Amazon EMR on EKS job with a number of CPU architectures

If you happen to’re evaluating migrating to Graviton cases for Amazon EMR on EKS workloads, we advocate testing the Spark workloads primarily based in your real-world use instances. If you want to run workloads throughout a number of processor architectures, for instance take a look at the efficiency for Intel and Arm CPUs, comply with the walkthrough on this part to get began with some concrete concepts.

Construct a single multi-arch Docker picture

To construct a single multi-arch Docker picture (x86_64 and arm64), full the next steps:

  1. Get the Docker Buildx CLI extension.Docker Buildx is a CLI plugin that extends the Docker command to assist the multi-architecture characteristic. Improve to the newest Docker desktop or manually obtain the CLI binary. For extra particulars, take a look at Working with Buildx.
  2. Validate the model after the set up:
  3. Create a brand new builder that provides entry to the brand new multi-architecture options (you solely must carry out this process as soon as):
    docker buildx create --name mybuilder --use

  4. Log in to your individual Amazon ECR registry:
    ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output textual content)
    aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_URL

  5. Get the EMR Spark base picture from AWS:
    docker pull $SRC_ECR_URL/spark/emr-6.6.0:newest
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $SRC_ECR_URL

  6. Construct and push a customized Docker picture.

On this case, we construct a single Spark benchmark utility docker picture on high of Amazon EMR 6.6. It helps each Intel and Arm processor architectures:

  • linux/amd64 – x86_64 (often known as AMD64 or Intel 64)
  • linux/arm64 – Arm
docker buildx construct 
--platform linux/amd64,linux/arm64 
-t $ECR_URL/eks-spark-benchmark:emr6.6 
-f docker/benchmark-util/Dockerfile 
--build-arg SPARK_BASE_IMAGE=$SRC_ECR_URL/spark/emr-6.6.0:newest 
--push .

Submit Amazon EMR on EKS jobs with and with out Graviton

For our first instance, we submit a benchmark job to the Graviton3 node group that spins up c7g.4xlarge cases.

The next will not be a whole script. Take a look at the full model of the instance on GitHub.

aws emr-containers start-job-run 
--virtual-cluster-id $VIRTUAL_CLUSTER_ID 
--name emr66-c7-4xl 
--execution-role-arn $EMR_ROLE_ARN 
--release-label emr-6.6.0-latest 
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "native:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "sparkSubmitParameters": "........"}}' 
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "": “C7g_4”

Within the following instance, we run the identical job on non-Graviton C5 cases with Intel 64 CPU. The full model of the script is offered on GitHub.

aws emr-containers start-job-run 
--virtual-cluster-id $VIRTUAL_CLUSTER_ID 
--name emr66-c5-4xl 
--execution-role-arn $EMR_ROLE_ARN 
--release-label emr-6.6.0-latest 
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "native:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "sparkSubmitParameters": "........"}}'     
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "”: “C5_4”


In Might 2022, the Graviton3 occasion household was made obtainable to Amazon EMR on EKS. After operating the performance-optimized EMR Spark runtime on the chosen newest Arm-based Graviton3 cases, we noticed as much as 19% efficiency improve and as much as 15% price financial savings in comparison with C6g Graviton2 cases. As a result of Amazon EMR on EKS provides 100% API compatibility with open-source Apache Spark, you may rapidly step into the analysis course of with no utility adjustments.

If you happen to’re questioning how a lot efficiency achieve you may obtain together with your use case, check out the benchmark resolution or the EMR on EKS Workshop. You can too contact your AWS Options Architects, who could be of help alongside your innovation journey.

Concerning the creator

Melody Yang is a Senior Huge Knowledge Resolution Architect for Amazon EMR at AWS. She is an skilled analytics chief working with AWS prospects to offer finest observe steering and technical recommendation so as to help their success in information transformation. Her areas of pursuits are open-source frameworks and automation, information engineering and DataOps.


Please enter your comment!
Please enter your name here