Running Spark on Amazon Web Services (AWS)

When you search thought the net looking for methods of running Apache Spark on AWS infrastructure you are most likely to be redirected to the documentation of AWS EMR (Elastic Map Reduce) service, which is Amazon's Hadoop distribution suited to run in AWS cloud environment. It's quite an easy way to deploy your data pipelines, but sometimes bootstrapping a huge cluster to perform simple ad-hoc analysis it's a cumbersome task. They say:

"to a man with a hammer everything looks like a nail" :)

and we felt into this trap with EMR once.

The article below describes two other ways of running Apache Spark jobs on AWS-managed infrastructure - AWS Glue and AWS Fargate - that we use on our clients' data warehousing projects. You will find there the key differences between these methods when it comes to flexibility and pricing, showing why there is no place for "one service fits all" approach in AWS world.

Check out!

big data

spark

AWS

Amazon Web Services

Last updated: 18 December 2019

Written by

Mariusz Strzelecki

Data Engineer

Want more? Check our articles

getindata 2021 lets celebrate our achivements big data world blog

GetInData in 2021 - let’s celebrate our achievements in the Big Data world!

The year 2021 passed in the blink of an eye and the time has come to summarize our goals at GetinData and define our challenges for the next year…

Tutorial

My experience with Apache Flink for Complex Event Processing

My goal is to create a comprehensive review of available options when dealing with Complex Event Processing using Apache Flink. We will be building a…

Reflecting on 2023: Celebrating GetInData’s Achievements in Data & AI

Let’s take a little step back to 2023 to summarize and celebrate our achievements. Last year was focused on knowledge-sharing actions and joining…

Tutorial

Maximizing Personalization: Real-Time Context and Persona Drive Better-Suited Products and Customer Experiences

Have you ever searched for something that isn't typical for you? Maybe you were looking for a gift for your grandmother on Amazon or wanted to listen…

getindata pycaret bigqueryml train deploy machine learning model notext

Tutorial

PyCaret and BigQueryML Inference Engine. Is this the fastest way to train and deploy a machine learning model?

Streamlining ML Development: The Impact of Low-Code Platforms Time is often a critical factor that can make or break the success of any endeavor. In…

getindator justice fighting with ai illustration 2c2801f5 b279 474f 9812 56a64a8366c2

Large Language Models - the legal aspects of licensing for commercial purposes

In the rapidly evolving landscape of artificial intelligence (AI), large language models (LLMs) have become indispensable tools for various…

Running Spark on Amazon Web Services (AWS)

Like this post?
Spread the word

Want more? Check our articles

GetInData in 2021 - let’s celebrate our achievements in the Big Data world!

My experience with Apache Flink for Complex Event Processing

Reflecting on 2023: Celebrating GetInData’s Achievements in Data & AI

Maximizing Personalization: Real-Time Context and Persona Drive Better-Suited Products and Customer Experiences

PyCaret and BigQueryML Inference Engine. Is this the fastest way to train and deploy a machine learning model?

Large Language Models - the legal aspects of licensing for commercial purposes

Contact us

Interested in our solutions?
Contact us!

Running Spark on Amazon Web Services (AWS)

Like this post?Spread the word

Want more? Check our articles

GetInData in 2021 - let’s celebrate our achievements in the Big Data world!

My experience with Apache Flink for Complex Event Processing

Reflecting on 2023: Celebrating GetInData’s Achievements in Data & AI

Maximizing Personalization: Real-Time Context and Persona Drive Better-Suited Products and Customer Experiences

PyCaret and BigQueryML Inference Engine. Is this the fastest way to train and deploy a machine learning model?

Large Language Models - the legal aspects of licensing for commercial purposes

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!