If you want to run analytics in a serverless cloud environment, Amazon Web Services reckons it can help you out all while reducing your operating costs and simplifying deployments.
As is typical for Amazon, the cloud giant previewed this EMR Serverless platform – EMR once meaning Elastic MapReduce – at its Re:Invent conference in December, and only opened the services to the public this week.
AWS is no stranger to serverless with products like Lambda. However, its EMR offering specifically targets analytics workloads, such as those using Apache Spark, Hive, and Presto.
Amazon’s existing EMR platform already supported deployments on VPC clusters running in EC2, Kubernetes clusters in EKS, and on-prem deployments running on Outposts. And while this provides greater control over the application and compute resources, it also required the user to manually configure and manage the cluster.
What’s more, the compute and memory resources needed for many data analytics workloads are subject to change depending on the complexity and volume of the data being processed, according to Amazon.
EMS Serverless promises to eliminate this complexity by automatically provisioning and scaling compute resources to meet the demands of open-source workloads. As more or less resources are required to accommodate changing data volumes, the platform automatically adds or removes workers. This, Amazon says, ensures that compute resources aren’t underutilized or over-committed. And customers are only charged for the time and number of workers required to complete the job.
Customers can further control costs by specifying a minimum and maximum number of workers and the virtual CPUs and memory allocated to each worker. Each application is fully isolated and runs within a secure instance.
According to Amazon, these capabilities make the platform ideal for a number of data pipeline, shared cluster, and interactive data workloads.
By default EMS Serverless workloads are configured to start when jobs are submitted and stop after the application has been idle for more than 15 minutes. However, customers can also per-initialize workers to reduce the time require starting the process.
EMR Serverless also supports shared applications using Amazon’s identity and access management roles. This enables multiple tenants to submit jobs using a common pool of workers, the company explained in a release.
At launch, EMS Serverless supports applications built using the Apache Spark and Hive frameworks.
Regardless of how the application is deployed, workloads are managed centrally from Amazon’s EMR Studio. The control plane also allows customers to spin up new workloads, submit jobs, and review diagnostics data. The service also integrates with AWS S3 object storage, enabling Spark and Hive logs to be saved for review.
EMR Serverless is available now in Amazon’s North Virginia, Oregon, Ireland, and Tokyo regions. ®