AWS Glue is a serverless information integration service that makes it easy to find, put together, and mix information for analytics, machine studying (ML), and software improvement. You should utilize AWS Glue to create, run, and monitor information integration and ETL (extract, remodel, and cargo) pipelines and catalog your belongings throughout a number of information shops. Usually, these information integration jobs can have various levels of precedence and time sensitivity. For instance, non-urgent workloads equivalent to pre-production, testing, and one-time information hundreds usually don’t require quick job startup occasions or constant runtimes through devoted sources.
At this time, we’re happy to announce the final availability of a brand new AWS Glue job run class referred to as Flex. Flex permits you to optimize your prices in your non-urgent or non-time delicate information integration workloads equivalent to pre-production jobs, testing, and one-time information hundreds. With Flex, AWS Glue jobs run on spare compute capability as a substitute of devoted {hardware}. The beginning and runtimes of jobs utilizing Flex can range as a result of spare compute sources aren’t available and could be reclaimed throughout the run of a job
Whatever the run possibility used, AWS Glue jobs have the identical capabilities, together with entry to customized connectors, visible authoring interface, job scheduling, and Glue Auto Scaling. With the Flex execution possibility, clients can optimize the prices of their information integration workloads by configuring the execution possibility primarily based on the workloads’ necessities, utilizing customary execution possibility for time-sensitive workloads, and Flex for non-urgent workloads. The Flex execution class is offered for AWS Glue 3.0 Spark jobs.
The Flex execution class is offered for AWS Glue 3.0 Spark jobs.
On this submit, we offer extra particulars about AWS Glue Flex jobs and allow Flex capability.
How do you utilize Versatile capability?
The AWS Glue jobs API now helps a further parameter referred to as execution-class
, which helps you to select STANDARD or FLEX when operating the job. To make use of Flex, you merely set the parameter to FLEX.
To allow Flex through the AWS Glue Studio console, full the next steps:
- On the AWS Glue Studio console, whereas authoring a job, navigate to the Job particulars tab
- Choose Flex Execution.
- Set an applicable worth for the Job Timeout parameter (defaults to 120 minutes for Flex jobs).
- Save the job.
- After finalizing all different particulars, select Run to run the job with Flex capability.
On the Runs tab, it is best to be capable to see FLEX listed underneath Execution class.
You can too allow Flex through the AWS Command Line Interface (AWS CLI).
You may set the --execution-class
setting within the start-job-run
API, which helps you to run a specific AWS Glue job’s run with Flex capability:
You can too set the --execution-class
throughout the create-job
API. This units the default run class of all of the runs of this job to FLEX:
The next are extra particulars in regards to the related parameters:
- –execution-class – The enum string that specifies if a job needs to be run as FLEX or STANDARD capability. The default is STANDARD.
- –timeout – Specifies the time (in minutes) the job will run earlier than it’s moved right into a TIMEOUT state.
When must you use Versatile capability?
The Flex execution class is good for decreasing the prices of time-insensitive workloads. For instance:
- Nightly ETL jobs, or jobs that run over weekends for processing workloads
- One-time bulk information ingestion jobs
- Jobs operating in take a look at environments or pre-production workloads
- Time-insensitive workloads the place it’s acceptable to have variable begin and finish occasions
As compared, the usual execution class is good for time-sensitive workloads that require quick job startup and devoted sources. As well as, jobs which have downstream dependencies are higher served by the usual execution class.
What’s the typical life-cycle of a Versatile capability Job?
When a start-job-run
API name is issued, with the execution-class set to FLEX, AWS Glue will start to request compute sources. If no sources can be found instantly upon issuing the API name, the job will transfer right into a WAITING state. No billing happens at this level.
As quickly because the job is ready to purchase compute sources, the job strikes to a RUNNING state. At this level, even when all of the computes requested aren’t accessible, the job begins operating on no matter {hardware} is current. As extra Flex capability turns into accessible, AWS Glue provides it to the job, as much as a most worth specified by Variety of staff
.
At this level, billing begins. You’re charged just for the compute sources which are operating at any given time, and just for the length that they ran for.
Whereas the job is operating, if Flex capability is reclaimed, AWS Glue continues operating the job on the present compute sources whereas it tries to fulfill the shortfall by requesting extra sources. If capability is reclaimed, billing for that capability is halted as effectively. Billing for brand new capability will begin when it’s provisioned once more. If the job completes efficiently, the job’s state strikes to SUCCEEDED
. If the job fails attributable to numerous consumer or system errors, the job’s state transitions to FAILED
. If the job is unable to finish earlier than the time specified by the --timeout
parameter, whether or not attributable to a scarcity of compute capability or attributable to points with the AWS Glue job script, the job goes right into a TIMEOUT
state.
Versatile job runs depend on the supply of non-dedicated compute capability in AWS, which in flip relies on a number of elements, such because the Area and Availability Zone, time of day, day of the week, and the variety of DPUs required by a job.
A parameter of explicit significance for Flex Jobs is the --timeout
worth. It’s attainable for Flex jobs to take longer to run than customary jobs, particularly if capability is reclaimed whereas the job is operating. In consequence, deciding on the suitable timeout worth that’s applicable in your workload is crucial. Select a timeout worth such that the entire price of the Flex job run doesn’t exceed an ordinary job run. If the worth is ready too excessive, the job can anticipate too lengthy, making an attempt to amass capability that isn’t accessible. If the worth is ready too low, the job occasions out, even when capability is offered and the job execution is continuing accurately.
How are Flex capability jobs billed?
Flex jobs are billed per employee on the Flex DPU-hour charges. Because of this you’re billed just for the capability that really ran throughout the execution of the job, for the length that it ran.
For instance, should you ran an AWS Glue Flex job for 10 staff, and AWS Glue was solely capable of purchase 5 staff, you’re solely billed for 5 staff, and just for the length that these staff ran. If, throughout the job run, two out of these 5 staff are reclaimed, then billing for these two staff is stopped, whereas billing for the remaining three staff continues. If provisioning for the 2 reclaimed staff is profitable throughout the job run, billing for these two will begin once more.
For extra info on Flex pricing, consult with AWS Glue pricing.
Conclusion
This submit discusses the brand new AWS Glue Flex job execution class, which lets you optimize prices for non-time-sensitive ETL workloads and take a look at environments.
You can begin utilizing Flex capability in your present and new workloads right this moment. Nonetheless, be aware that the Flex class is just not supported for Python Shell jobs, AWS Glue streaming jobs, or AWS Glue ML jobs.
For extra info on AWS Glue Flex jobs, consult with their newest documentation.
Particular due to everybody who contributed to the launch: Parag Shah, Sampath Shreekantha, Yinzhi Xi and Jessica Cheng,
Concerning the authors
Aniket Jiddigoudar is a Massive Knowledge Architect on the AWS Glue group.
Vaibhav Porwal is a Senior Software program Improvement Engineer on the AWS Glue group.
Sriram Ramarathnam is a Software program Improvement Supervisor on the AWS Glue group.