Spring Batch Scheduling: A Comprehensive Guide 101

on Batch Processing, Data Processing, Spring • April 27th, 2022 • Write for Hevo

spring batch scheduling: FI

With the rising of handling large volumes of data, organizations nowadays have to rely on bulk processing. This includes automated operations and the complex processing of massive datasets without user interaction.

Therefore, organizations are using Batch Processing Frameworks. Spring Batch processing can assist in developing robust batch processing applications and even manage event-driven operations using Spring Batch Scheduler. 

This detailed article explains Spring Batch Scheduling with a relevant example. In addition to that, it also explains Batch Processing and Spring Batch extensively.

Table of Contents

Prerequisites Required for Spring Batch Scheduling

Basics of batch processing

What is Batch Processing?

Batch Processing was first pioneered by Herman Hollerith who used this method while inventing the first tabulating machine in the 19th century. This device, which was capable of counting and sorting data organized on punched cards, became the forerunner of the modern computer. The cards, as well as the information on them, could then be collected and processed in batches. Large amounts of data could be processed more quickly and accurately with this innovation than with manual entry methods.

Batch processing is an efficient way of running a large number of iterative data jobs. With the right amount of computing resources present, the batch method allows you to process data with little to no user interaction. 

After you have collected and stored your data, the batch processing method allows you to process it during an event called a “batch window“. It provides an efficient workflow layout by prioritizing processing tasks and completing the given data jobs when it makes the most sense. 

Batch Processing is a technique for consistently processing large amounts of data. The batch method allows users to process data with little or no user interaction when computing resources are available.

Batch Processing has become popular due to its numerous benefits for enterprise data management. It has several advantages for businesses:

  • Efficiency: When computing or other resources are readily available, Batch Processing allows a company to process jobs. Companies can schedule batch processes for jobs that aren’t as urgent and prioritize time-sensitive jobs. Batch systems can also run in the background to reduce processor stress.
  • Simplicity: Batch Processing, in comparison to Stream Processing, is a less complex system that does not require special hardware or system support. For data input, it requires less maintenance.
  • Improved Data Quality: Batch Processing reduces the chances of errors by automating most or all components of a processing job and minimizing user interaction. To achieve a higher level of data quality, precision and accuracy are improved.
  • Faster Business Intelligence: Batch Processing allows companies to process large volumes of data quickly, resulting in faster Business Intelligence. Batch Processing reduces processing time and ensures that data is delivered on time because many records can be processed at once. And, because multiple jobs can be handled at the same time, business intelligence is available faster than ever before.

The fundamental concepts of batch processing should be familiar and comfortable to any batch developer. The diagram below depicts a simplified version of the batch reference architecture, which has been proven over many years and platforms. It defines the key concepts and terms used by Spring Batch when it comes to batch processing.

spring batch scheduling: batch processing example
Image Source

A batch process is typically encapsulated by a Job consisting of multiple steps, as shown in our batch processing example. A single ItemReader, ItemProcessor, and ItemWriter are usually present in each Step. A JobLauncher executes a Job and a JobRepository stores metadata about configured and executed jobs.

Each Job can have multiple JobInstances, each of which is defined by its own set of JobParameters that are used to initiate a batch job. A JobExecution is the name given to each run of a JobInstance. Each JobExecution usually keeps track of what happened during a run, including current and exit statuses, start and end times, and so on.

A Step is a separate, distinct phase of a batch job, with each Job consisting of one or more Steps. A Step has a single StepExecution, similar to a Job, that represents a single attempt to execute a Step. StepExecution keeps track of current and exit statuses, start and end times, and so on, as well as references to its Step and JobExecution counterparts.

An ExecutionContext is a collection of key-value pairs that contain information about StepExecution or JobExecution. The ExecutionContext is saved by Spring Batch, which is useful if you need to restart a batch job (e.g., when a fatal error has occurred, etc.). All that is required is for any object to be shared between steps to be placed in the context, and the framework will handle the rest. The previous ExecutionContext’s values are restored from the database and applied after a restart.

All of this persistence is made possible by the JobRepository mechanism in Spring Batch. For JobLauncher, Job, and Step instantiations, it offers CRUD operations. A JobExecution is obtained from the repository once a Job is launched, and StepExecution and JobExecution instances are persisted in the repository during the execution process.

Process Data in Minutes Using Hevo’s No-Code Data Pipeline

Hevo Data, a Fully-managed Data Processing solution, can help you automate, simplify & enrich your aggregation process in a few clicks. With Hevo’s out-of-the-box connectors and blazing-fast Data Pipelines, you can extract & aggregate data from 100+ Data Sources(including 40+ Free Sources) straight into your Data Warehouse, Database, or any destination. To further streamline and prepare your data for analysis, you can process and enrich Raw Granular Data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!”

GET STARTED WITH HEVO FOR FREE[/hevoButton]

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!

What is Spring Batch?

Developed in 2002, Spring Batch is an open-source framework useful for batch processing. Spring Batch is used in a lot of enterprise applications to make batch processing systems that are both reliable and light. Spring Batch provides reusable functions for processing large volumes of data. It also includes various features like logging, transactions management, job processing statistics, job restart skip, and resource management. 

Spring Batch is an open-source batch processing framework. It’s a simple, all-in-one solution for creating reliable batch applications, which are common in modern enterprise systems. Spring Batch extends the Spring Framework’s POJO-based development approach.

Spring Batch includes reusable functions for logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management, which are all important in processing large volumes of records. It also includes more advanced technical services and features that, through optimization and partitioning techniques, will enable extremely high-volume and high-performance batch jobs. Simple as well as complex, high-volume batch jobs can use the framework to process large amounts of data in a highly scalable manner.

Architecture of Spring Batch

The Spring Batch architecture mainly consists of three components – Application, Batch Core, and Batch Infrastructure. All batch Jobs and custom code written by Spring Batch developers are contained in the Application layer. The Batch Core contains the essential runtime classes for launching and controlling batch jobs.

  • Application: All of the jobs and code were written with the Spring Batch framework are contained in this component.
  • Batch Core: All of the API classes required to control and launch a Batch Job are contained in this component.
  • Batch Infrastructure: The readers, writers, and services used by both the application and Batch core components are contained in this component.

It also consists of the implementation of Job launcher, Job, and Step. Any job in the Spring Batch starts with the Job Launcher. Job is registered with the Job Launcher and has a Step. The Step consists of three components inside the Spring framework: ItemReader, ItemProcessor, and ItemWriter. These components are helpful when you want to read, write, or process data from a particular source. Both Application and Batch layers are built on the common layer called Batch infrastructure, which consists of common readers and writers services.

  • Job: A job is the batch process that will be executed in a Spring Batch application. It runs uninterrupted from beginning to end. This task is further broken down into steps (or a job contains steps).
  • Step: A step is a self-contained part of a job that contains the information needed to define and complete the task (its part). Each step is made up of an ItemReader, an ItemProcessor (optional), and an ItemWriter, as shown in the diagram. One or more steps may be included in a job.
  • Readers, Writers, and Processors: An item reader reads data from a specific source into a Spring Batch application, whereas an item writer writes data from the Spring Batch application to a specific destination. An item processor is a class that contains the processing code for reading data into the spring batch and processing it. If the application reads “n” records, the processor’s code will be run for each record.
  • JobRepository: For the JobLauncher, Job, and Step implementations, a Job repository in Spring Batch provides Create, Retrieve, Update, and Delete (CRUD) operations.
  • JobLauncher: JobLauncher is a user interface for launching a Spring Batch job with a set of parameters. The interface JobLauncher is implemented by the class SampleJoblauncher.
  • JobInstance: When you run a job, a JobInstance is created, which represents the job’s logical run. The name of the job and the parameters passed to it while it is running are used to distinguish each job instance.
  • JobExecution and StepExecution: The execution of a job/step is represented by JobExecution and StepExecution. They contain the job/run step’s information, such as the start and end times (of the job/step).

What Makes Hevo’s Data Processing Unique

Processing data can be a mammoth task without the right set of tools. Hevo’s automated platform empowers you with everything you need to have a smooth Data Collection, Processing, and Replication experience. Our platform has the following in store for you!

  • Exceptional Security: A Fault-tolerant Architecture that ensures consistency and robust security with  Zero Data Loss.
  • Built to Scale: Exceptional Horizontal Scalability with Minimal Latency for Modern-data Needs.
  • Built-in Connectors: Support for 100+ Data Sources, including Databases, SaaS Platforms, Files & More. Native Webhooks & REST API Connector available for Custom Sources.
  • Data Transformations: Best-in-class & Native Support for Complex Data Transformation at fingertips. Code & No-code Fexibilty designed for everyone.
  • Smooth Schema Mapping: Fully-managed Automated Schema Management for incoming data with the desired destination.
  • Blazing-fast Setup: Straightforward interface for new customers to work on, with minimal setup time.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
SIGN UP HERE FOR A 14-DAY FREE TRIAL

Understanding Spring Batch Scheduling

To understand Spring Batch Scheduling, look at the concepts below:

Spring Batch Scheduling: Create Spring Batch Jobs

To create the Jobs with Java configuration, you require Spring Boot2, Spring batch 4, and the H2 database. Therefore, create two simple Jobs named MyTaskOne.java and MyTaskTwo.java, as shown below.

Add the Spring-Boot-Starter-Batch dependency to reuse functions when processing large amounts of data. You’ll also need a database because Spring Batch relies on non-volatile job repositories. H2 in-memory database is used as it works seamlessly in Spring Batch. Add the below dependency to the pom.xml file, as shown below.

<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;
  <modelVersion>4.0.0</modelVersion>
 
  <groupId>com.howtodoinjava</groupId>
  <artifactId>App</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>
 
  <name>App</name>
  <url>http://maven.apache.org</url>
 
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.0.3.RELEASE</version>
  </parent>
 
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
 
  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-batch</artifactId>
    </dependency>
    <dependency>
      <groupId>com.h2database</groupId>
      <artifactId>h2</artifactId>
      <scope>runtime</scope>
    </dependency>
  </dependencies>
 
  <build>
    <plugins>
      <plugin>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-maven-plugin</artifactId>
      </plugin>
    </plugins>
  </build>
 
  <repositories>
    <repository>
      <id>repository.spring.release</id>
      <name>Spring GA Repository</name>
      <url>http://repo.spring.io/release</url>
    </repository>
  </repositories>
</project>

Spring Batch Scheduling: Add Tasklets

You need to add tasks to your job in Spring Batch using Tasklets. Include the below code in the file MyTaskOne.java.

import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
 
public class MyTaskOne implements Tasklet {
 
  public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception
  {
    System.out.println("MyTaskOne start..");
 
      // ... your code
       
      System.out.println("MyTaskOne done..");
      return RepeatStatus.FINISHED;
  }	
}

Similarly, add tasks to your second job, which is MyTaskTwo.java using the below code.

import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
 
public class MyTaskTwo implements Tasklet {
 
  public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception
  {
    System.out.println("MyTaskTwo start..");
 
      // ... your code
       
      System.out.println("MyTaskTwo done..");
      return RepeatStatus.FINISHED;
  }	
}

Spring Batch Scheduling: Spring Batch Configuration

You have to define all the job-related configuration and execution logic in the BatchConfig.java file using the below code.

package com.howtodoinjava.demo.config;
 
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
 
import com.howtodoinjava.demo.tasks.MyTaskOne;
import com.howtodoinjava.demo.tasks.MyTaskTwo;
 
@Configuration
@EnableBatchProcessing
public class BatchConfig {
   
  @Autowired
  private JobBuilderFactory jobs;
 
  @Autowired
  private StepBuilderFactory steps;
   
  @Bean
  public Step stepOne(){
      return steps.get("stepOne")
              .tasklet(new MyTaskOne())
              .build();
  }
   
  @Bean
  public Step stepTwo(){
      return steps.get("stepTwo")
              .tasklet(new MyTaskTwo())
              .build();
  }  
   
  @Bean
  public Job demoJob(){
      return jobs.get("demoJob")
          .incrementer(new RunIdIncrementer())
              .start(stepOne())
              .next(stepTwo())
              .build();
  }
}

Your job named ‘demoJob’ is configured and ready to be executed. Use CommandLineRunner to execute the job automatically. Use JobLauncher to start your Java application using the App.java file. JobLauncher is used to start the execution of the given job with its parameters.

package com.howtodoinjava.demo;
 
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
 
@SpringBootApplication
public class App implements CommandLineRunner
{
  @Autowired
  JobLauncher jobLauncher;
   
  @Autowired
  Job job;
   
  public static void main(String[] args)
  {
    SpringApplication.run(App.class, args);
  }
 
  @Override
  public void run(String... args) throws Exception
  {
    JobParameters params = new JobParametersBuilder()
          .addString("JobID", String.valueOf(System.currentTimeMillis()))
          .toJobParameters();
    jobLauncher.run(job, params);
  }
}

Notice the console logs to check the automatic execution of jobs, as shown below.

o.s.b.c.l.support.SimpleJobLauncher  	: Job: [SimpleJob: [name=demoJob]] launched with
the following parameters: [{JobID=1530697766768}]
 
o.s.batch.core.job.SimpleStepHandler 	: Executing step: [stepOne]
MyTaskOne start..
MyTaskOne done..
 
o.s.batch.core.job.SimpleStepHandler 	: Executing step: [stepTwo]
MyTaskTwo start..
MyTaskTwo done..
 
o.s.b.c.l.support.SimpleJobLauncher  	: Job: [SimpleJob: [name=demoJob]] completed with
the following parameters: [{JobID=1530697766768}] and the following status: [COMPLETED]

Although Spring can automatically run Jobs that are configured, you can disable the auto-run of such Jobs using spring.batch.job.enabled in the application.properties file with the below command.

spring.batch.job.enabled=false

Spring Batch Scheduling: Spring Batch Jobs Scheduling 

You can execute Spring Batch Jobs periodically on a fixed schedule using some cron expressions that are passed to Spring TaskScheduler. Cron expressions in scheduling are used to represent the details of the schedule. Execute the above Spring Batch Jobs using Batch Spring Job Scheduler.

You can configure Spring Batch Jobs in two different ways:

  • Using the @EnableScheduling annotation.
  • Creating a method annotated with @Scheduled and providing recurrence details with the job. Then add the job execution logic inside this method.

Spring Batch Scheduling Jobs:

package com.howtodoinjava.demo;
 
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;
 
@SpringBootApplication
@EnableScheduling
public class App
{
  @Autowired
  JobLauncher jobLauncher;
   
  @Autowired
  Job job;
   
  public static void main(String[] args)
  {
    SpringApplication.run(App.class, args);
}

 @Scheduled(cron = "0 */1 * * * ?")
    public void perform() throws Exception
  {
    JobParameters params = new JobParametersBuilder()
        .addString("JobID", String.valueOf(System.currentTimeMillis()))
        .toJobParameters();
    jobLauncher.run(job, params);
  }
}

On starting your Java application, the above jobs will start executing after a minute.

In the console log, you can see the below status:

MyTaskOne start..
MyTaskOne done.. 
 
MyTaskTwo start..
MyTaskTwo done..

Spring Batch Scheduling: Triggering Spring Batch Jobs

Use a class SpringBatchScheduler to configure the scheduling of Spring Batch Jobs. A method called launchJob() will be registered as a scheduled task. To trigger the scheduled Spring Batch job, add the conditional flag for firing the job only when the flag is set to true (using the below code).

private AtomicBoolean enabled = new AtomicBoolean(true);

private AtomicInteger batchRunCounter = new AtomicInteger(0);

@Scheduled(fixedRate = 2000)
public void launchJob() throws Exception {
    if (enabled.get()) {
        Date date = new Date();
        JobExecution jobExecution = jobLauncher()
          .run(job(), new JobParametersBuilder()
            .addDate("launchDate", date)
            .toJobParameters());
        batchRunCounter.incrementAndGet();
    }
}

From above, batchRunCounter is used in integration tests to verify if the batch job has been stopped.

Spring Batch Scheduling: How to Stop Spring Batch Jobs

With the conditional flag mentioned in the above code, you can trigger the scheduled Spring Batch job with the scheduled task. Similarly, you can also stop the Spring Batch Jobs to save your resources in two different ways shown below.

  • Scheduler Post Processor: Since you are scheduling a method by @Scheduled annotation, a bean processor ScheduledAnnotationBeanPostProcessor is registered first. Bean is a Java object created by Spring Framework when the application is started. It is the backbone of your application managed by Spring container. A bean processor is used to add custom modifications to new instances of a bean created by Spring.

You can explicitly call the postProcessBeforeDestruction()to destroy the given scheduled bean, as shown below.

@Test
public void stopJobSchedulerWhenSchedulerDestroyed() throws Exception {
    ScheduledAnnotationBeanPostProcessor bean = context
      .getBean(ScheduledAnnotationBeanPostProcessor.class);
    SpringBatchScheduler schedulerBean = context
      .getBean(SpringBatchScheduler.class);
    await().untilAsserted(() -> Assert.assertEquals(
      2, 
      schedulerBean.getBatchRunCounter().get()));
    bean.postProcessBeforeDestruction(
      schedulerBean, "SpringBatchScheduler");
    await().atLeast(3, SECONDS);

    Assert.assertEquals(
      2, 
      schedulerBean.getBatchRunCounter().get());
}
  • Scheduled Future: Another way to stop the scheduler is by canceling the Scheduled Future. The Scheduled Future consists of lists of scheduled integration tasks that have not been executed yet.

The below code consists of Scheduled Future to stop the scheduler.

Another way to stop the scheduler is by canceling the Scheduled Future. The Scheduled Future consists of lists of scheduled integration tasks that have not been executed yet.

The below code consists of Scheduled Future to stop the scheduler.

@Bean
public TaskScheduler poolScheduler() {
    return new CustomTaskScheduler();
}

private class CustomTaskScheduler 
  extends ThreadPoolTaskScheduler {

    //

    @Override
    public ScheduledFuture<?> scheduleAtFixedRate(
      Runnable task, long period) {
        ScheduledFuture<?> future = super
          .scheduleAtFixedRate(task, period);

        ScheduledMethodRunnable runnable = (ScheduledMethodRunnable) task;
        scheduledTasks.put(runnable.getTarget(), future);

        return future;
    }
}


// Iterate the Future and cancel the Future of your batch job scheduler.
public void cancelFutureSchedulerTasks() {
    scheduledTasks.forEach((k, v) -> {
        if (k instanceof SpringBatchScheduler) {
            v.cancel(false);
        }
    });
}

Conclusion

In this article, you learned about Spring Batch Jobs and scheduling, triggering, and stopping of the Spring Batch Jobs. The concept of Tasklet was used for scheduling Spring Batch Jobs using the Scheduler and cron expressions. Spring Batch provides advanced technical services that enable incredibly high volume and high-performance batch Jobs with optimization and partitioning techniques.

visit our website to explore hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

No Code Data Pipeline For Your Data Warehouse