Moving Data with Mage on AWS ECS

What’s that all about Mage?

According to mage.ai: “Mage is an open-source data pipeline tool for transforming and integrating data. Mage is a modern replacement for Airflow.” My first blog post ‘Airflow is not an ETL tool…’ brought additional questions about data orchestration and data orchestration tools such as, are there something like pure data orchestrators that use “different tools and technologies together to move data between systems”? Are data orchestrators aka connectors that connect different tools to extract, transform, and load the data? Why do people use the terms data pipelines and data orchestration interchangeably? Are they the same?

While browsing the Web researching on the topic I came across the article ‘Introducing SQLake: Data Pipelines Without Manual Orchestration’ from Upsolver that in my opinion, merges these two concepts beautifully, “Simply put, every data pipeline is composed of two parts: Transformation code and orchestration. If you run daily batches, orchestration is relatively simple and no one cares if a batch takes hours to run since you can schedule it for the middle of the night. However, delivering data every hour or minute means you have many more batches. Suddenly auto-healing and performance become crucial, forcing data engineers to dedicate most of their time to manually building Direct Acyclic Graphs (DAGs), in tools like Apache Airflow, with dozens to hundreds of steps that map all success and failure modes across multiple data stores, address dependencies, and maintain temporary data copies that are required for processing.” This paragraph made sense to me why Mage uses the term ‘data pipeline’ AND ‘the modern replacement for Airflow’ in the same sentence…

Containerize everything

As I was just getting started with Mage, I ran Docker on my computer but the next step for me was to bring Mage and my pipelines to the Cloud as currently, Mage doesn’t have a cloud edition. You have several options to configure and deploy Mage: (1) deploy an AWS ECS cluster with Terraform; (2) AWS EC2 instance; (3) deploy an AWS ECS cluster manually. As it was a side project to stimulate the deployment of the Mage application, I opted for the manual creation of ECS.

Read the complete article here.

--

--