AWS Batch is one of the most popular services of AWS that allows you to create and run pipeline jobs periodically or on-demand. With its user-friendly and interactive user interface, AWS Batch enables you to seamlessly build, configure, and launch pipeline jobs.

AWS not only allows you to create and execute jobs with its UI but also empowers you to execute jobs using the pre-built or pre-customized Docker images. With AWS Batch, you can run a single Docker script to kickoff multiple pipeline jobs periodically or based on specific time schedules. 

In this article, you will learn about AWS Batch and how to create and kickoff pipeline AWS batch jobs using Docker images.

What are AWS Batch Jobs?

AWS Batch is a fully managed batch computing service that makes it easy to run and scale batch jobs on the AWS Cloud.

Key Components of AWS Batch

  • Jobs: Units of work (scripts, executables, Docker containers) submitted to AWS Batch. They run on Fargate or EC2, with parameters defined in a job definition.
  • Job Definitions: Blueprints for jobs, specifying resources (memory, CPU), IAM roles, container properties, and environment variables.
  • Job Queues: Hold jobs until scheduled. Associated with one or more compute environments, allowing for priority-based scheduling.
  • Compute Environments: Sets of compute resources (Fargate, EC2) used to run jobs. Can be managed (AWS handles) or unmanaged (you manage).
Unlock the full potential of your AWS ecosystem with Hevo

Seamlessly integrate your data sources like S3, Redshift, and RDS with Hevo’s robust platform. Automate data pipelines, transform data on the fly, and deliver real-time insights to your business applications. Try Hevo and equip your team to: 

  1. Integrate data from 150+ sources(60+ free sources).
  2. Simplify data mapping with an intuitive, user-friendly interface.
  3. Instantly load and sync your transformed data into your desired destination.

You can see it for yourselves by looking at our 2000+ happy customers, such as Airmeet, Cure.Fit, and Pelago.

Get Started with Hevo for Free

How to Initiate and launch Pipeline AWS batch Jobs

With AWS Batch, you can run or invoke pipeline AWS batch jobs without installing and configuring any batch computing tools or server clusters so that you can spend more time evaluating data and addressing problems. It is very straightforward to initiate and launch data pipeline Jobs using AWS batch Jobs tool.

1. Prerequisites

  • To kickoff pipeline jobs using AWS batch Jobs tool, you have to satisfy certain prerequisites. If this is your first time using AWS batch Jobs tool, make sure you have a valid task queue and compute environment in the AWS Batch space.
  • You can follow this official documentation to learn how to create a task queue and compute environment in AWS batch Jobs tool. In addition, you should have a preconfigured or ready-to-use docker environment to develop and register the Docker image, which you will use in further steps for creating pipeline jobs.
  • You should also pre-installed the AWS CLI (Command-Line Tool) to run commands for accessing AWS services. Refer to this documentation for learning how to install and configure AWS CLI.

2. Building the Fetch and Run Docker image

The fetch & run Docker image is a simple script that reads certain environment variables to download and then executes the job script (or zip file) using the AWS CLI. To download the docker image, visit the GitHub repository of “aws-batch-helpers” and download the source code.

Then, navigate to the “fetch-and-run” folder after unzipping the downloaded file. You can also download the most recent version of the docker image by pulling or cloning the fetch and run folder from the GitHub repository. After unzipping the “fetch-and-run” folder, you can find two files such as Dockerfile and fetch_and_run.sh. 

  • Initially, you have to build the “fetch-and-run” docker image by executing the Docker command given below.
docker build -t awsbatch/fetch_and_run
  • After executing the above command, you will get an output that resembles the following image.

Step 6/6 : EENTRYPOINT /usr/local/bin/fetch_and_run.sh
 ---> Running in aa454b301d37
 ---> fe753d94c372

Removing intermediate container aa454b301d37
Successfully built 9aa226c28efc
  • You can confirm whether the docker image is successfully built by executing the command given below. After executing the above command, you can see the newly created Docker image is active.

3. Creating an ECR repository

In the next step, you have to create an ECR repository that allows you to store, monitor, and delete Docker images. You can effectively store the newly created “fetch-and-run” docker image and set access permissions so that it can be retrieved by AWS Batch Jobs tool while exciting pipeline jobs.

  • Initially, navigate to the ECR console and click on “Create Repository.”
  • Then, enter the name of the ECR repository as “awsbatch/fetch_and_run” and click on “Next Step.”
AWS Batch Jobs: Build, tag and push Docker image
  • Now, you successfully created an ECR repository.

4. Pushing the Docker Image to the ECR repository

In the next step, you have to push the Docker image to the newly created “awsbatch/fetch and run” repository. Execute the following command in AWS CLI to implement the process of pushing the Docker image into the ECR repository. You can replace the AWS account number and region in the command with your own account and region.

aws ecr get-login --region us-east-1
docker tag awsbatch/fetch_and_run:latest 012345678901.dkr.ecr.us-east-1.amazonaws.com/awsbatch/fetch_and_run:latest
docker push 012345678901.dkr.ecr.us-east-1.amazonaws.com/awsbatch/fetch_and_run:latest
Solve your data replication problems with Hevo’s reliable, no-code, automated pipelines with 150+ connectors.
Get your free trial right away!

5. Creating a Pipeline job script and upload to S3

  • In the next step, you have to create a new pipeline job by executing the “fetch and run” image that you already created and registered in ECR. Initially, you have to create a file called “myjob.sh” with the following sample content and then upload the script to an S3 bucket.
#!/bin/bash
Date
echo "Args: $@"
Env
echo "This is my simple test job!."
echo "jobId: $AWS_BATCH_JOB_ID"
echo "jobQueue: $AWS_BATCH_JQ_NAME"
echo "computeEnvironment: $AWS_BATCH_CE_NAME"
sleep $1
Date
echo "bye bye!!"
  • After executing the above code, upload the script to the S3 bucket by executing the following command.
aws s3 cp myjob.sh s3://<bucket>/myjob.sh

6. Creating an IAM Role

To authentically execute the AWS Batch job for accessing the S3 bucket, you must first create an IAM role. Since the fetch and run image fetches the job script from Amazon S3 when executed as an AWS Batch job, you’ll require an IAM role that allows the AWS Batch job to access S3. 

AWS Batch Jobs: create role
  • Navigate to the IAM console and choose Roles. Then, click on Create New Role. In the “Select type of trusted entity” section and choose AWS service. 
AWS Batch Jobs: Choose the service for the role
  • Now, select “Elastic Container Service,” as shown in the above image.
  • In the “Select your use case” section, select Elastic Container Service Task, and click on “Next: Permissions.”
AWS Batch Jobs: attach permission policies
  • Now, you are redirected to the Attach Policy page. In the search bar, type “AmazonS3ReadOnlyAccess” as shown in the above image. Then select the “AmazonS3ReadOnlyAccess” policy checkbox and click on choose “Next: Review.”
  • Now, choose Create Role and give your new role a name as batchJobRole. Then, the new role’s specifications are disclosed to you, as shown in the above image.

7. Creating a Job Definition

As of now, you have created all of the necessary resources to build a pipeline job in AWS batch Jobs tool. Now, pull them all together and construct a job description that you can use to run one or more AWS batch Jobs tool processes. 

AWS Batch Jobs: create job definition
  • Navigate to the AWS Batch Jobs console and choose the Job Definitions menu on the left side panel. 
  • Now, you can find the “Create a job definition” section on the right side, as shown in the above image.
  • Then, in the Job Definition field, enter “fetch_and_run.” 
  • In the Container image field, enter the URL of the ECR Repository. For this case, the URL is 012345678901.dkr.ecr.us-east-1.amazonaws.com/awsbatch/fetch_and_run.
  • You can leave the Command field blank and for vCPUs and Memory field, enter 1 and 500, respectively.
  • After filling in all the necessary fields, click on Create job definition.

8. Running a Pipeline Job

This phase requires you to submit and run a task that uses the fetch and run image to download and execute the job script.

AWS Batch Jobs: submit AWS batch job
  • In the AWS batch Jobs tool console, click on the Jobs menu in the left side panel and select Submit Job.
  • In the Job name field, enter a “script_test.”
  • Then, select the newly created fetch_and_run job definition from the dropdown menu in the Job definition field.
  • In the Job Queue field, select the first-run-job-queue from the dropdown menu.
  • In the Command section, enter “[myjob.sh,60]” and click on the Validate command.
AWS Batch Jobs: validate and submit job
  • Now, you have to add Key and Value to the Environment Variables section, as shown in the above image.
    • Key=BATCH_FILE_TYPE, Value=script
    • Key=BATCH_FILE_S3_URL, Value=s3:///myjob.sh. Don’t forget to use the correct URL for your file.
  • After filling in all the necessary fields, click on the “Submit Job” button.
  • Now, confirm whether the job is successfully submitted by checking the final status in the console.
  • As shown in the above image, you can find the status of the job as “SUCCEEDED,” which confirms that the job has been submitted successfully.

By following the above-mentioned steps, you successfully created and executed pipeline jobs using AWS batch Jobs tool.

Learn More About:

AWS Batch Scheduling

AWS Glue Architecture

Conclusion

In this article, you learned about AWS batch Jobs tool and how to create and kickoff pipeline jobs in AWS batch Jobs tool. This article mainly focused on creating a single job definition and job using AWS batch Jobs tool. However, you can also run as many jobs as you need with the same job definition by uploading your jobs’ script to Amazon S3 and running “SubmitJob” with the appropriate environment variables. 

There are various trusted sources that companies use as it provides many benefits but transferring data from it into a data warehouse is a hectic task. The Automated data pipeline helps in solving this issue and this is where Hevo comes into the picture. Sign up for Hevo Data, a No-code Data Pipeline that has awesome 150+ pre-built Integrations that you can choose from.

FAQ

1. What is AWS Batch for?

AWS Batch is a service that enables you to run batch computing workloads, such as processing large datasets, modeling simulations, and rendering graphics, without managing the underlying infrastructure.

2. What is the difference between AWS Batch and lambda?

AWS Batch is designed for running large-scale batch jobs with complex requirements, while AWS Lambda is a serverless service for short-duration, event-driven tasks. Batch can handle long-running workloads, while Lambda has a maximum execution time of 15 minutes.

3. How long can an AWS Batch job run?

An AWS Batch job can run indefinitely, as there is no predefined time limit, but the job duration depends on the compute resources and application requirements.

Ishwarya M
Technical Content Writer, Hevo Data

Ishwarya is a skilled technical writer with over 5 years of experience. She has extensive experience working with B2B SaaS companies in the data industry, she channels her passion for data science into producing informative content that helps individuals understand the complexities of data integration and analysis.