Kubernetes is an open-source system, built on top of 15 years of experience running production workloads at Google in tandem with the best ideas and practices from the community.
A Kubernetes Batch Job is a predefined set of processing actions that the user submits to the system to be carried out with little or no interaction from the user.
This article talks about setting up and running Kubernetes Batch Jobs. It also describes Kubernetes and its Key Features.
Kubernetes Architecture
Kubernetes defines a set of building blocks (referred to as “primitives”) that work together to deploy, maintain, and scale applications based on CPU, memory, or custom metrics. Kubernetes is a loosely coupled, extensible container platform that can handle a variety of workloads. The Kubernetes API is used by internal components, extensions, and containers that run on Kubernetes. The platform takes control of computing and storage resources by defining them as Objects, which can then be managed.
It automates the deployment and management of cloud-native applications on-premises and in the cloud. It distributes application workloads across a Kubernetes cluster and handles dynamic container networking requirements automatically. Kubernetes also provides resiliency by allocating storage and persistent volumes to running containers, scaling automatically, and continuously maintaining the desired state of applications.
What is Kubernetes Batch Job?
A job creates one or more Pods and will keep retrying their execution until a certain number of them have been completed successfully. The Job keeps track of successful pod completions as they happen. The task (i.e. Job) is completed when a certain number of successful completions is reached. When you delete a Job, the Pods it created will be deleted as well. Suspending a Job causes all active Pods to be deleted until the Job is resumed.
To create batch transactions, Kubernetes provides two workload resources: the Job object and the CronJob object. A Job object creates one or more Pods and attempts to retry the execution until a specified number of them terminate successfully. CronObjects, such as crontab, run on a cron schedule.
To run one Pod to completion, create one Job object. If the first Pod fails or is deleted, the Job object creates a new one (for example due to a node hardware failure or a node reboot).
To execute and manage a batch task on your cluster, you can use a Kubernetes Job. You can specify the maximum number of Pods that should run in parallel as well as the number of Pods that should complete their tasks before the Job is finished.
A Job can also be used to run multiple Pods at the same time. CronJob is a better option if you want to run a job on a schedule.
Hevo Data, a No-code Data Pipeline, helps integrate data from various databases with 150+ other sources and load it in a data warehouse of your choice. It provides a consistent & reliable solution to manage data in real-time and always has analysis-ready data in your desired destination. Check out what makes Hevo amazing:
- Easy Integration: Connect and migrate data without any coding.
- Auto-Schema Mapping: Automatically map schemas to ensure smooth data transfer.
- In-Built Transformations: Transform your data on the fly with Hevo’s powerful transformation capabilities.
- Load Events in Batches: Events can be loaded in batches in certain data warehouses.
SIGN UP HERE FOR A 14-DAY FREE TRIAL
Understanding Kubernetes Batch Job
What is a CronJob?
CronJobs are used to schedule Kubernetes batch jobs. On a Linux or UNIX system, these automated jobs are run as Cron tasks.
Cron jobs are useful for creating recurring tasks like backups and emails. Individual tasks can also be scheduled for a specific time with Cron jobs, for example, if you want to schedule a job for a low-activity period.
Cron jobs have their own set of constraints and quirks. A single cron job, for example, can create multiple jobs in some circumstances. Jobs should therefore be irreversible.
In Kubernetes v1.21, CronJobs was upgraded to general availability. If you’re using an older version of Kubernetes, make sure you’re looking at the documentation for the version you’re using. The batch/v1 CronJob API isn’t supported by older Kubernetes versions.
Creating a CronJob
- Creating a cron job requires a config file. Every minute, the following cron job config.spec file prints the current time and a hello message:
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox:1.28
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
- Use the following command to run the example CronJob:
kubectl create -f https://k8s.io/examples/application/job/cronjob.yaml
- The result looks like this:
cronjob.batch/hello created
- Get the status of the cron job after you’ve created it with this command:
kubectl get cronjob hello
- The result looks like this:
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 0 <none> 10s
- The cron job has not yet scheduled or run any Kubernetes batch jobs, as evidenced by the output of the command. In about one minute, the job will be created:
kubectl get jobs --watch
- The result looks like this:
NAME COMPLETIONS DURATION AGE
hello-4111706356 0/1 0s
hello-4111706356 0/1 0s 0s
hello-4111706356 1/1 5s 5s
- You’ve now seen one running Kubernetes batch job that the “hello” cron job has scheduled. Now, look at the cron job again to see if the Kubernetes batch job was scheduled:
kubectl get cronjob hello
- The result looks like this:
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 0 50s 75s
- The cron job hello should have successfully scheduled a job at the LAST SCHEDULE time. There are currently no active Kubernetes batch jobs, indicating that they have either finished or failed.
- Locate the pods that the last scheduled Kubernetes batch job created and examine one of them.
# Replace "hello-4111706356" with the job name in your system
pods=$(kubectl get pods --selector=job-name=hello-4111706356 --output=jsonpath={.items[*].metadata.name})
kubectl logs $pods
- The result looks like this:
Fri Feb 22 11:02:09 UTC 2019
Hello from the Kubernetes cluster
Writing a CronJob Specification
- A cron job, like all other Kubernetes batch jobs and configurations, requires the apiVersion, kind, and metadata fields.
- A .spec section is also required in cron job configuration.
Schedule
- The .spec.schedule field is a mandatory field. It accepts a Cron format string as the schedule time for its Kubernetes batch jobs to be created and executed, such as 0 * * * * or @hourly.
- Extended “Vixie cron” step values are also included in the format. According to the FreeBSD manual:
- In addition to ranges, step values can be used. Following a range with /number> indicates that the number’s value will be skipped throughout the range. In the hours field, for example, 0-23/2 specifies command execution every other hour (the alternative in the V7 standard is 0,2,4,6,8,10,12,14,16,18,20,22). After an asterisk, you can use */2 to say “every two hours.”
Job Template
The template for the Kubernetes batch job is.spec.jobTemplate, and it is required. It has the same schema as a Job, with the exception that it is nested and lacks an apiVersion or kind.
Starting Deadline
- The field .spec.startingDeadlineSeconds is not required. It represents the time limit in seconds for starting the job if it is delayed for any reason. The cron job does not start the job after the deadline has passed. Failed jobs are those that do not meet their deadline in this way. The Kubernetes batch jobs have no deadline if this field is left blank.
- The CronJob controller measures the time between when a job is expected to be created and now if the .spec.startingDeadlineSeconds field is set (not null). If the difference exceeds that threshold, the execution will be skipped.
- When set to 200, for example, a Kubernetes batch job can be created for up to 200 seconds after the actual schedule.
Concurrency Policy
Optional is the.spec.concurrencyPolicy field. It describes how this cron job should handle concurrent job executions. A single concurrency policy may be specified in the specification:
- Allow (default): The cron job allows multiple jobs to run simultaneously.
- Forbid: The cron job does not allow concurrent runs; if a new job run is needed but the previous one hasn’t been completed yet, the new job run is skipped.
- Replace: The cron job replaces the currently running job run with a new job run when it’s time for a new job run and the previous job run hasn’t finished yet.
It’s important to note that the concurrency policy only applies to jobs created by the same cron job. If there are multiple cron jobs, they are always allowed to run at the same time.
Suspend
Also optional is the .spec.suspend field. If true, all subsequent executions are halted. This option does not affect currently running executions. By default, it is set to true.
Jobs History Limits
Optional fields include .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit. These fields specify the number of completed and failed jobs that should be saved. They are set to 3 and 1 respectively by default. Setting the limit to 0 means that no jobs of that type will be kept after they finish.
Running Job through Coarse Parallel Processing
You’ll run a Kubernetes Job with multiple parallel worker processes in this example.
Each pod in this example takes one unit of work from a task queue, completes it, deletes it from the queue, and then exits.
The following is a summary of the steps in this example:
- Start a Message Queue Service: RabbitMQ is used in this example, but you could use another. In practice, you’d create a message queue service once and then reuse it for a variety of tasks.
- Create a Queue, and Fill it with Messages: Each message denotes a specific task that must be completed. A message is an integer in this example, on which we will perform a lengthy computation.
- Start a Job that Works on Tasks from the Queue: Several pods are started by the job. Each pod selects one task from the message queue, processes it, and then repeats the process until the queue is empty.
Starting a Message Queue Service
- Although RabbitMQ is used in this example, you can adapt it to use any AMQP-type message service.
- In practice, you could set up a message queue service once in a cluster and then reuse it for multiple jobs and long-running services.
- To get RabbitMQ up and running, do the following:
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.3/examples/celery-rabbitmq/rabbitmq-service.yaml
service "rabbitmq-service" created
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.3/examples/celery-rabbitmq/rabbitmq-controller.yaml
replicationcontroller "rabbitmq-controller" created
Testing the Message Queue Service
- You can now use the message queue to play around with. You’ll make a temporary interactive pod, install some tools, and try out some queues.
- Create a temporary interactive Pod.
# Create a temporary interactive container
kubectl run -i --tty temp --image ubuntu:18.04
Waiting for pod default/temp-loe07 to be running, status is Pending, pod ready: false
... [ previous line repeats several times .. hit return when it stops ] ...
- Your pod name and command prompt will be unique to you. Then, to work with message queues, install the amqp-tools package.
# Install some tools
root@temp-loe07:/# apt-get update
.... [ lots of output ] ....
root@temp-loe07:/# apt-get install -y curl ca-certificates amqp-tools python dnsutils
.... [ lots of output ] ....
- You’ll create a docker image later that includes these packages.
- The next step is to see if you can find the rabbitmq serv
# Note the rabbitmq-service has a DNS name, provided by Kubernetes:
root@temp-loe07:/# nslookup rabbitmq-service
Server: 10.0.0.10
Address: 10.0.0.10#53
Name: rabbitmq-service.default.svc.cluster.local
Address: 10.0.147.152
# Your address will vary.
- The previous step may not work if Kube-DNS is not configured properly. An env var can also be used to get the service IP:
# env | grep RABBIT | grep HOST
RABBITMQ_SERVICE_SERVICE_HOST=10.0.147.152
# Your address will vary.
- You’ll then test whether you can create a queue, as well as publish and consume messages.
# In the next line, rabbitmq-service is the hostname where the rabbitmq-service
# can be reached. 5672 is the standard port for rabbitmq.
root@temp-loe07:/# export BROKER_URL=amqp://guest:guest@rabbitmq-service:5672
# If you could not resolve "rabbitmq-service" in the previous step,
# then use this command instead:
# root@temp-loe07:/# BROKER_URL=amqp://guest:guest@$RABBITMQ_SERVICE_SERVICE_HOST:5672
# Now create a queue:
root@temp-loe07:/# /usr/bin/amqp-declare-queue --url=$BROKER_URL -q foo -d
foo
# Publish one message to it:
root@temp-loe07:/# /usr/bin/amqp-publish --url=$BROKER_URL -r foo -p -b Hello
# And get it back.
root@temp-loe07:/# /usr/bin/amqp-consume --url=$BROKER_URL -q foo -c 1 cat && echo
Hello
root@temp-loe07:/#
- The amqp-consume tool takes one message (-c 1) from the queue and sends it to the standard input of an arbitrary command in the last command. The program cat prints out the characters read from standard input in this case, and the echo appends a carriage return to make the example readable.
Filling the Queue with Tasks
- Fill the queue with “tasks” now. Your tasks, in this case, are strings to be printed.
- The messages’ content in a practice might be:
- names of the files that must be processed
- extra flags to the program
- extra flags to the program configuration parameters to a simulation
- frame numbers of a scene to be rendered
- In practice, if there are large data that all pods of the Job require in a read-only mode, you will typically put it on a shared file system like NFS and mount it read-only on all pods, or the pod’s program will read data natively from a cluster file system like HDFS.
- You will use the amqp command-line tools to create and fill the queue in your example. In practice, you could use an amqp client library to write a program to fill the queue.
/usr/bin/amqp-declare-queue --url=$BROKER_URL -q job1 -d
job1
for f in apple banana cherry date fig grape lemon melon
do
/usr/bin/amqp-publish --url=$BROKER_URL -r job1 -p -b $f
done
Create an Image
- You’re now ready to create an image that will be used as a job.
- The amqp-consume utility will be used to read the message from the queue and run our program. Here’s a quick example program:
#!/usr/bin/env python
# Just prints standard out and sleeps for 10 seconds.
import sys
import time
print("Processing " + sys.stdin.readlines()[0])
time.sleep(10)
- Grant permission for the script to run:
chmod +x worker.py
- Create a picture now. Change the directory to examples/job/work-queue-1 if you’re working in the source tree. If not, create a temporary directory, change to it, and download the Dockerfile and worker.py. Build the image with the following command in either case:
docker build -t job-wq-1 .
- Tag your app image with your username and push it to the Docker Hub using the commands below. username> should be replaced with your Hub username.
docker tag job-wq-1 <username>/job-wq-1
docker push <username>/job-wq-1
- Tag your app image with your project ID and push it to Google Container Registry if you’re using it. project> should be replaced with your project ID.
docker tag job-wq-1 gcr.io/<project>/job-wq-1
gcloud docker -- push gcr.io/<project>/job-wq-1
Defining a Job
- A job description is depicted here. Make a copy of the Job and rename it./job.yaml after editing the image to match the name you used.
apiVersion: batch/v1
kind: Job
metadata:
name: job-wq-1
spec:
completions: 8
parallelism: 2
template:
metadata:
name: job-wq-1
spec:
containers:
- name: c
image: gcr.io/<project>/job-wq-1
env:
- name: BROKER_URL
value: amqp://guest:guest@rabbitmq-service:5672
- name: QUEUE
value: job1
restartPolicy: OnFailure
- In this case, each pod completes one item from the queue before exiting. As a result, the Job’s completion count is equal to the number of work items completed. As an example, you set. spec.completions: 8 because you have 8 items in the queue.
Running the Job
- So, here’s how to run the job:
kubectl apply -f ./job.yaml
- Now, wait a few moments before checking on the job.
kubectl describe jobs/job-wq-1
Name: job-wq-1
Namespace: default
Selector: controller-uid=41d75705-92df-11e7-b85e-fa163ee3c11f
Labels: controller-uid=41d75705-92df-11e7-b85e-fa163ee3c11f
job-name=job-wq-1
Annotations: <none>
Parallelism: 2
Completions: 8
Start Time: Wed, 06 Sep 2017 16:42:02 +0800
Pods Statuses: 0 Running / 8 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=41d75705-92df-11e7-b85e-fa163ee3c11f
job-name=job-wq-1
Containers:
c:
Image: gcr.io/causal-jigsaw-637/job-wq-1
Port:
Environment:
BROKER_URL: amqp://guest:guest@rabbitmq-service:5672
QUEUE: job1
Mounts: <none>
Volumes: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
───────── ──────── ───── ──── ───────────── ────── ────── ───────
27s 27s 1 {job } Normal SuccessfulCreate Created pod: job-wq-1-hcobb
27s 27s 1 {job } Normal SuccessfulCreate Created pod: job-wq-1-weytj
27s 27s 1 {job } Normal SuccessfulCreate Created pod: job-wq-1-qaam5
27s 27s 1 {job } Normal SuccessfulCreate Created pod: job-wq-1-b67sr
26s 26s 1 {job } Normal SuccessfulCreate Created pod: job-wq-1-xe5hj
15s 15s 1 {job } Normal SuccessfulCreate Created pod: job-wq-1-w2zqe
14s 14s 1 {job } Normal SuccessfulCreate Created pod: job-wq-1-d6ppa
14s 14s 1 {job } Normal SuccessfulCreate Created pod: job-wq-1-p17e0
All of your pods were successful.
Running Job through Fine Parallel Processing
You will run a Kubernetes Job with multiple parallel worker processes in a pod in this example.
In this example, as each pod is created, it takes one unit of work from a task queue, processes it, and then repeats the process until the task queue is full.
The steps in this example are summarised as follows:
- Start a Storage Service to Hold the Work Queue: You’re storing our work items in Redis in this example. RabbitMQ was used in the preceding example. Because AMQP lacks a good way for clients to detect when a finite-length work queue is empty, you use Redis and a custom work-queue client library in this example. In practice, you’d create a store like Redis once and then reuse it for things like job queues and other tasks.
- Create a Queue, and Fill it with Messages: Each message represents a single task that must be completed. A message in this example is an integer on which we will perform a lengthy computation.
- Start a Job that Works on Tasks from the Queue: Several pods are started by the Job. Each pod selects a task from the message queue, processes it, and then repeats the process until the queue is empty.
Starting Redis
- For the sake of simplicity, you will only run one instance of Redis in this example.
- You can also directly download the following files:
Filling the Queue with Tasks
- Fill the queue with “tasks” now. Your tasks, in this case, are strings to be printed.
- To run the Redis CLI, create a temporary interactive pod.
kubectl run -i --tty temp --image redis --command "/bin/sh"
Waiting for pod default/redis2-c7h78 to be running, status is Pending, pod ready: false
Hit enter for command prompt
- Now press enter to launch the Redis CLI and create a list with some tasks.
# redis-cli -h redis
redis:6379> rpush job2 "apple"
(integer) 1
redis:6379> rpush job2 "banana"
(integer) 2
redis:6379> rpush job2 "cherry"
(integer) 3
redis:6379> rpush job2 "date"
(integer) 4
redis:6379> rpush job2 "fig"
(integer) 5
redis:6379> rpush job2 "grape"
(integer) 6
redis:6379> rpush job2 "lemon"
(integer) 7
redis:6379> rpush job2 "melon"
(integer) 8
redis:6379> rpush job2 "orange"
(integer) 9
redis:6379> lrange job2 0 -1
1) "apple"
2) "banana"
3) "cherry"
4) "date"
5) "fig"
6) "grape"
7) "lemon"
8) "melon"
9) "orange"
- Your work queue will thus be the list with the key job2.
- Note: If your Kube DNS isn’t configured correctly, you might need to change the first step of the above block to redis-cli -h $REDIS SERVICE HOST.
Create an Image
- You’re now ready to create an image for running.
- To read messages from the message queue, you’ll use a python worker program and a Redis client.
- The rediswq.py library is a simple Redis Work Queue Client library.
- The work queue client library is used by the “worker” program in each Pod of the Job to get work. It’s as follows:
#!/usr/bin/env python
import time
import rediswq
host="redis"
# Uncomment next two lines if you do not have Kube-DNS working.
# import os
# host = os.getenv("REDIS_SERVICE_HOST")
q = rediswq.RedisWQ(name="job2", host=host)
print("Worker with sessionID: " + q.sessionID())
print("Initial queue state: empty=" + str(q.empty()))
while not q.empty():
item = q.lease(lease_secs=10, block=True, timeout=2)
if item is not None:
itemstr = item.decode("utf-8")
print("Working on " + itemstr)
time.sleep(10) # Put your actual work here instead of sleep.
q.complete(item)
else:
print("Waiting for work")
print("Queue empty, exiting")
- You could also get the worker.py, rediswq.py, and Dockerfile files and build the image yourself:
docker build -t job-wq-2 .
Push the Image
- Tag your app image with your username and push it to the Docker Hub using the commands below. The username should be replaced with your Hub username.
docker tag job-wq-2 <username>/job-wq-2
docker push <username>/job-wq-2
- You must either push to a public repository or configure your cluster so that your private repository can be accessed.
- If you’re using Google Container Registry, add your project ID to your app image and push it to GCR. Replace project> with the ID of your project.
docker tag job-wq-2 gcr.io/<project>/job-wq-2
gcloud docker -- push gcr.io/<project>/job-wq-2
Defining a Job
- The following is the job description:
apiVersion: batch/v1
kind: Job
metadata:
name: job-wq-2
spec:
parallelism: 2
template:
metadata:
name: job-wq-2
spec:
containers:
- name: c
image: gcr.io/myproject/job-wq-2
restartPolicy: OnFailure
- Replace gcr.io/myproject with your path in the job template.
- In this case, each pod works on a few items from the queue before exiting when no more are available. Because the workers detect when the work queue is empty and the Job controller is unaware of it, the workers must signal when they are finished working. By exiting successfully, the workers indicate that the queue is empty. As soon as any worker completes their task successfully, the controller knows the job is complete, and the Pods will depart soon. As a result, you changed the Job’s completion count to 1. The job controller will also await the completion of the other pods.
Running the Job
- So, here’s how you run a job:
kubectl apply -f ./job.yaml
- Now, wait a few moments before checking on the job.
kubectl describe jobs/job-wq-2
Name: job-wq-2
Namespace: default
Selector: controller-uid=b1c7e4e3-92e1-11e7-b85e-fa163ee3c11f
Labels: controller-uid=b1c7e4e3-92e1-11e7-b85e-fa163ee3c11f
job-name=job-wq-2
Annotations: <none>
Parallelism: 2
Completions: <unset>
Start Time: Mon, 11 Jan 2016 17:07:59 -0800
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=b1c7e4e3-92e1-11e7-b85e-fa163ee3c11f
job-name=job-wq-2
Containers:
c:
Image: gcr.io/exampleproject/job-wq-2
Port:
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
33s 33s 1 {job-controller } Normal SuccessfulCreate Created pod: job-wq-2-lglf8
kubectl logs pods/job-wq-2-7r7b2
Worker with sessionID: bbd72d0a-9e5c-4dd6-abf6-416cc267991f
Initial queue state: empty=False
Working on banana
Working on date
Working on lemon
Conclusion
This article explains Kubernetes Batch Job extensively. In addition, it talks about Kubernetes and its key features.Hevo Data, a No-code Data Pipeline, provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks. Hevo Data, with its strong integration with 150+ sources (including 40+ free sources), allows you to not only export data from your desired data sources & load it to the destination of your choice but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
FAQs
1. What are batch jobs for?
Batch jobs are used to process large volumes of data or perform repetitive tasks automatically. They run without manual intervention, typically scheduled to run at specific times or intervals.
2. What is the difference between cron job and batch job?
A cron job is a type of scheduled task that runs at regular intervals, like every day or week, based on a time schedule. A batch job, on the other hand, is a job that processes a set of tasks or data in a single run, often kicked off automatically when conditions are met.
3. What is job batch in Kubernetes?
In Kubernetes, a batch job is a task that runs to completion and then stops. It’s used to handle tasks like data processing or cleanup, ensuring that the job completes successfully, even if it needs to retry.
Harshitha is a dedicated data analysis fanatic with a strong passion for data, software architecture, and technical writing. Her commitment to advancing the field motivates her to produce comprehensive articles on a wide range of topics within the data industry.