Spring Boot is a Java web framework that is open source and based on microservices. The Spring Boot framework uses prebuilt code within its codebase to create a completely configurable and fully production-ready environment.

In this article, you will gain information about Batch Processing in Spring Boot. You will also gain a holistic understanding of Batch Processing, its key features, the Spring boot framework, its key features, and the need for Spring Boot Batch configuration.

It also provides a step-by-step guide to configuring the Spring Boot and also the creation of an application using Batch Processing in Spring Boot.

How to Configure the Spring Boot in Batch Processing?

You can follow the step-by-step configuration guide to configure the Spring Boot:

1) Batch Processing in Spring Boot: Pom.xml

Spring boot batch and database dependencies are listed in the pom.xml file. A database is required by the spring boot batch to store batch-related information. Spring boot batch dependencies will provide all required classes for batch execution. 

All jars relevant to spring boot batch will be included in the dependency spring-boot-starter-batch. The spring boot application will include the h2 database driver due to the h2 database dependency. You can take the following reference:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.5.2</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.yawintutor</groupId>
	<artifactId>SpringBootBatch2</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>SpringBootBatch2</name>
	<description>Demo project for Spring Boot</description>
	<properties>
		<java.version>11</java.version>
	</properties>
	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-batch</artifactId>
		</dependency>
 
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.springframework.batch</groupId>
			<artifactId>spring-batch-test</artifactId>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>com.h2database</groupId>
			<artifactId>h2</artifactId>
		</dependency>		
	</dependencies>
 
	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>
 
</project>

2) Spring Boot Main class

The default spring boot main class is used to begin the spring boot batch. There are no additional annotations or configurations on the Main class. You can consider the below class.

package com.yawintutor;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class SpringBootBatch2Application {

	public static void main(String[] args) {
		SpringApplication.run(SpringBootBatch2Application.class, args);
	}
}

3) Application properties

In the configuration, two more application properties should be added. 

  • The database url, which will be used to connect to the database, will be included in the first property. 
  • The spring boot batch’s second property will enable it to create batch tables while the application is running.
spring.datasource.url=jdbc:h2:file:./DB
spring.batch.initialize-schema=ALWAYS

4) Start Spring Boot Application

After completing all of the preceding considerations and changes, you can now launch the spring boot application.

It will now function normally. However, if there are any database errors, they should be resolved before proceeding.

5) ItemReader Implementation

The ItemReader interface is used to define the reader class. This class should include the data reading code. The spring boot batch will read the data using the read method.

You can go through the following code snippet for defining the reader class.

package com.yawintutor;

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.NonTransientResourceException;
import org.springframework.batch.item.ParseException;
import org.springframework.batch.item.UnexpectedInputException;

public class MyCustomReader implements ItemReader<String>{

	private String[] stringArray = { "Zero", "One", "Two", "Three", "Four", "Five" };

	private int index = 0;

	@Override
	public String read() throws Exception, UnexpectedInputException,
			ParseException, NonTransientResourceException {
		if (index >= stringArray.length) {
			return null;
		}
		
		String data = index + " " + stringArray[index];
		index++;
		System.out.println("MyCustomReader    : Reading data    : "+ data);
		return data;
	}

}

6) ItemProcessor Implementation

You can define the data processing class using the ItemProcessor interface. This class should contain the data processing code. To process the data, the Spring Boot batch will invoke the process method.

You can refer to the following data processing class in the given code snippet.

package com.yawintutor;

import org.springframework.batch.item.ItemProcessor;

public class MyCustomProcessor implements ItemProcessor<String, String> {

	@Override
	public String process(String data) throws Exception {
		System.out.println("MyCustomProcessor : Processing data : "+data);
		data = data.toUpperCase();
		return data;
	}

}

7) ItemWriter Implementation

The ItemWriter interface is used to define the writer class. This class should contain the code for writing the data after it has been processed. The spring boot batch will write the data using the write method.

You can follow the given writer class in the code snippet.

package com.yawintutor;

import java.util.List;

import org.springframework.batch.item.ItemWriter;

public class MyCustomWriter implements ItemWriter<String> {

	@Override
	public void write(List<? extends String> list) throws Exception {
		for (String data : list) {
			System.out.println("MyCustomWriter    : Writing data    : " + data);
		}
		System.out.println("MyCustomWriter    : Writing data    : completed");
	}
}

8) Spring Boot Batch Configurations

The spring boot batch configuration file specifies the batch job and batch steps. The JobBuilderFactory class generates a batch task. The StepBuilderFactory class generates a batch step.

The batch steps will be executed by the batch job. Batch jobs such as ItemReader, ItemProcessor, and ItemWriter will be defined in the batch step. The spring boot batch configuration defines how the batch should be run.

You can refer to the following code snippet:

package com.yawintutor;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
@EnableBatchProcessing
public class BatchConfig {
	@Autowired
	public JobBuilderFactory jobBuilderFactory;

	@Autowired
	public StepBuilderFactory stepBuilderFactory;

	@Bean
	public Job createJob() {
		return jobBuilderFactory.get("MyJob")
				.incrementer(new RunIdIncrementer())
				.flow(createStep()).end().build();
	}
	
	@Bean
	public Step createStep() {
		return stepBuilderFactory.get("MyStep")
				.<String, String> chunk(1)
				.reader(new MyCustomReader())
				.processor(new MyCustomProcessor())
				.writer(new MyCustomWriter())
				.build();
	}	
}

9) Spring Boot Batch Schedulers

The spring boot batch schedulers are executed automatically, invoking the spring boot batch tasks. The JobLauncher class will run the spring boot batch. A spring boot scheduler is used in this example to launch the spring boot batch at regular intervals.

package com.yawintutor;

import java.text.SimpleDateFormat;
import java.util.Calendar;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;

@Configuration
@EnableScheduling
public class SchedulerConfig {

	@Autowired
	JobLauncher jobLauncher;

	@Autowired
	Job job;

	SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.S");

	@Scheduled(fixedDelay = 5000, initialDelay = 5000)
	public void scheduleByFixedRate() throws Exception {
		System.out.println("Batch job starting");
		JobParameters jobParameters = new JobParametersBuilder()
				.addString("time", format.format(Calendar.getInstance().getTime())).toJobParameters();
		jobLauncher.run(job, jobParameters);
		System.out.println("Batch job executed successfullyn");
	}
}

10) Batch Processing in Spring Boot: Start Spring Boot Application

The spring boot batch configuration is complete. Now, you can start the spring boot batch application. The scheduler will begin the spring boot batch and complete all of the steps and tasks.

The log will look something as given below. Logs to read, process, and write tasks are displayed in the log.

This process is repeated several times until the data is ready for processing. When the data is finished, the batch will stop.

2021-07-22 23:00:55.897  INFO 39139 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=MyJob]] launched with the following parameters: [{run.id=3, time=2021-07-22 16:22:57.293}]
2021-07-22 23:00:55.927  INFO 39139 --- [           main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [MyStep]
MyCustomReader    : Reading data    : 0 Zero
MyCustomProcessor : Processing data : 0 Zero
MyCustomWriter    : Writing data    : 0 ZERO
MyCustomWriter    : Writing data    : completed
MyCustomReader    : Reading data    : 1 One
MyCustomProcessor : Processing data : 1 One
MyCustomWriter    : Writing data    : 1 ONE
MyCustomWriter    : Writing data    : completed
MyCustomReader    : Reading data    : 2 Two
MyCustomProcessor : Processing data : 2 Two
MyCustomWriter    : Writing data    : 2 TWO
MyCustomWriter    : Writing data    : completed
MyCustomReader    : Reading data    : 3 Three
MyCustomProcessor : Processing data : 3 Three
MyCustomWriter    : Writing data    : 3 THREE
MyCustomWriter    : Writing data    : completed
MyCustomReader    : Reading data    : 4 Four
MyCustomProcessor : Processing data : 4 Four
MyCustomWriter    : Writing data    : 4 FOUR
MyCustomWriter    : Writing data    : completed
MyCustomReader    : Reading data    : 5 Five
MyCustomProcessor : Processing data : 5 Five
MyCustomWriter    : Writing data    : 5 FIVE
MyCustomWriter    : Writing data    : completed
2021-07-22 23:00:55.954  INFO 39139 --- [           main] o.s.batch.core.step.AbstractStep         : Step: [MyStep] executed in 27ms
2021-07-22 23:00:55.958  INFO 39139 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=MyJob]] completed with the following parameters: [{run.id=3, time=2021-07-22 16:22:57.293}] and the following status: [COMPLETED] in 42ms

BONUS PROJECT: Let’s Create a Basic Application Using Batch Processing in Spring Boot

We’ll create a job that reads a CSV file, transforms it with a custom processor, and stores the final results in an in-memory database.

The jobs will be based on importing data from a Coffee list. The steps to be undergone to create the application using Batch Processing in Spring Boot are as follows:

Step 1: Batch Processing in Spring Boot: Maven Dependencies

To create a basic batch-driven application using Spring Boot, you need to first add the spring-boot-starter-batch to your pom.xml file.

You can go through the following code snippet:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
    <version>2.4.0</version>
</dependency>

You also need to add the org.hsqldb dependency. For this, you can go through the following code snippet:


<dependency>
    <groupId>org.hsqldb</groupId>
    <artifactId>hsqldb</artifactId>
    <version>2.5.1</version>
    <scope>runtime</scope>
</dependency>

Step 2: Defining a Spring Batch Job

  • First, you need to define the entry point of your application. For this, you can follow the below code snippet.
@SpringBootApplication
public class SpringBootBatchProcessingApplication {
 
    public static void main(String[] args) {
        SpringApplication.run(SpringBootBatchProcessingApplication.class, args);
    }
}
  • Define the application configuration properties in the src/main/resources/application.properties file.
file.input=coffee-list.csv

This file is a flat CSV file. Hence, Spring can handle it without any additional modifications. 

It contains the location of your input coffee list. And each line in the list contains different characteristics of the coffee such as brand, origin, etc. The format of the list:

Blue Mountain,Jamaica,Fruity
Lavazza,Colombia,Strong
Folgers,America,Smokey
  • Now, you need to add an SQL script to create a table named “coffee” to store the data. The name of the SQL script is schema-all.sql.
DROP TABLE coffee IF EXISTS;
 
CREATE TABLE coffee  (
    coffee_id BIGINT IDENTITY NOT NULL PRIMARY KEY,
    brand VARCHAR(20),
    origin VARCHAR(20),
    characteristics VARCHAR(30)
);

Now, Spring Boot will automatically run the script whenever you begin.

  • Further, you will create a domain class to hold the items of Coffee. Here, the Coffee class has three properties: brand, origin, and characteristics.
public class Coffee {
 
    private String brand;
    private String origin;
    private String characteristics;
 
    public Coffee(String brand, String origin, String characteristics) {
        this.brand = brand;
        this.origin = origin;
        this.characteristics = characteristics;
    }
 
    // getters and setters
}

Step 3: Job Configuration

  • You begin with a standard Spring @Configuration class in this step. 
  • After that, you can annotate your class with @EnableBatchProcessing. This annotation provides access to a variety of useful beans that help with jobs, saving time. It will also give you access to some useful factories that will come in handy when creating job configurations and job steps. 
  • You can include a reference to the previously declared file.input property in the final section of the initial configuration.
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;
    
    @Value("${file.input}")
    private String fileInput;
    
    // ...
}
  • You need to define a reader bean in the configuration. You can take the following reference, where the reader bean searches for a file named coffee-list.csv and parses each line into an object named “Coffee”.
@Bean
public FlatFileItemReader reader() {
    return new FlatFileItemReaderBuilder().name("coffeeItemReader")
      .resource(new ClassPathResource(fileInput))
      .delimited()
      .names(new String[] { "brand", "origin", "characteristics" })
      .fieldSetMapper(new BeanWrapperFieldSetMapper() {{
          setTargetType(Coffee.class);
      }})
      .build();
}
@Bean
public JdbcBatchItemWriter writer(DataSource dataSource) {
    return new JdbcBatchItemWriterBuilder()
      .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
      .sql("INSERT INTO coffee (brand, origin, characteristics) VALUES (:brand, :origin, :characteristics)")
      .dataSource(dataSource)
      .build();
}
  • Now, you need to add the actual job steps and configuration as given below.
@Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
    return jobBuilderFactory.get("importUserJob")
      .incrementer(new RunIdIncrementer())
      .listener(listener)
      .flow(step1)
      .end()
      .build();
}
 
@Bean
public Step step1(JdbcBatchItemWriter writer) {
    return stepBuilderFactory.get("step1")
      .<Coffee, Coffee> chunk(10)
      .reader(reader())
      .processor(processor())
      .writer(writer)
      .build();
}
 
@Bean
public CoffeeItemProcessor processor() {
    return new CoffeeItemProcessor();
}

The step is first configured to write up to ten records at a time using the chunk(10) declaration. The coffee data can then be read using the reader bean, which is configured using the reader method.

Following that, you send each coffee item to a custom processor where you can apply some custom business logic. Finally, you use the writer to add each coffee item to the database.

The job definition is contained in importUserJob. It contains an id generated by the RunIDIncrementer class. You’ve also added a JobCompletionNotificationListener to be notified when the job gets completed.

Step 4: The Custom Coffee Processor

The custom processor defined in the previous step looks something like this.

public class CoffeeItemProcessor implements ItemProcessor<Coffee, Coffee> {

    private static final Logger LOGGER = LoggerFactory.getLogger(CoffeeItemProcessor.class);

    @Override
    public Coffee process(final Coffee coffee) throws Exception {
        String brand = coffee.getBrand().toUpperCase();
        String origin = coffee.getOrigin().toUpperCase();
        String chracteristics = coffee.getCharacteristics().toUpperCase();

        Coffee transformedCoffee = new Coffee(brand, origin, chracteristics);
        LOGGER.info("Converting ( {} ) into ( {} )", coffee, transformedCoffee);

        return transformedCoffee;
    }
}

The ItemProcessor interface allows you to apply specific business logic during job execution. The CoffeeItemProcessor class is defined, which accepts a Coffee object as input and converts all of its properties to uppercase.

Step 5: Job Completion

You can write a JobCompletionNotificationListermer that can provide some feedback when the job is finished. You can refer to the following code snippet.

@Override
public void afterJob(JobExecution jobExecution) {
    if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
        LOGGER.info("!!! JOB FINISHED! Time to verify the results");

        String query = "SELECT brand, origin, characteristics FROM coffee";
        jdbcTemplate.query(query, (rs, row) -> new Coffee(rs.getString(1), rs.getString(2), rs.getString(3)))
          .forEach(coffee -> LOGGER.info("Found < {} > in the database.", coffee));
    }
}

Step 6: Running the Job

After completing everything, now you can run the job. Your job ran successfully and each of the coffee items is stored in the database.

...
17:41:16.336 [main] INFO  c.b.b.JobCompletionNotificationListener -
  !!! JOB FINISHED! Time to verify the results
17:41:16.336 [main] INFO  c.b.b.JobCompletionNotificationListener -
  Found < Coffee [brand=BLUE MOUNTAIN, origin=JAMAICA, characteristics=FRUITY] > in the database.
17:41:16.337 [main] INFO  c.b.b.JobCompletionNotificationListener -
  Found < Coffee [brand=LAVAZZA, origin=COLOMBIA, characteristics=STRONG] > in the database.
17:41:16.337 [main] INFO  c.b.b.JobCompletionNotificationListener -
  Found < Coffee [brand=FOLGERS, origin=AMERICA, characteristics=SMOKEY] > in the database.
…

Before wrapping up, let’s cover some basics.

What is the Need for Spring Boot Batch Configuration?

  • Spring boot batch processing is the automated processing of large amounts of data without the need for human intervention thus making it simpler and easier.
  • Further, the Spring boot batch generates a job that includes steps. Each stage will include tasks such as reading, processing, and writing. The spring boot batch chunk aids in the configuration of the execution.
  • Spring Boot Batch includes reusable functions such as logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management that are necessary when processing large volumes of records.
  • It also provides more advanced technical services and features that, through optimization and partitioning techniques, will enable extremely high-volume and high-performance batch jobs.
  • Simple as well as complex, high-volume batch jobs can use the framework to process large amounts of data in a highly scalable manner.

Conclusion

In this article, you have learned about Batch Processing in Spring Boot. This article also provided information on Batch Processing, its key features, the Spring boot framework, its key features, and the need for Spring Boot Batch configuration.

It also provides a step-by-step guide to configuring the Spring Boot in Batch Processing and also the creation of an application using Batch Processing in Spring Boot.

Manisha Jena
Research Analyst, Hevo Data

Manisha Jena is a data analyst with over three years of experience in the data industry and is well-versed with advanced data tools such as Snowflake, Looker Studio, and Google BigQuery. She is an alumna of NIT Rourkela and excels in extracting critical insights from complex databases and enhancing data visualization through comprehensive dashboards. Manisha has authored over a hundred articles on diverse topics related to data engineering, and loves breaking down complex topics to help data practitioners solve their doubts related to data engineering.

No-code Data Pipeline for your Data Warehouse