Spring Boot is a Java web framework that is open source and based on microservices. The Spring Boot framework uses prebuilt code within its codebase to create a completely configurable and fully production-ready environment.
In this article, you will gain information about Batch Processing in Spring Boot. You will also gain a holistic understanding of Batch Processing, its key features, the Spring boot framework, its key features, and the need for Spring Boot Batch configuration.
It also provides a step-by-step guide to configuring the Spring Boot and also the creation of an application using Batch Processing in Spring Boot.
What is Spring Boot?
Spring Boot is an open-source Java framework that simplifies the creation of standalone, production-grade Spring applications. It’s built on top of the Spring Framework and aims to reduce development time and effort by providing features like auto-configuration, embedded servers, and opinionated defaults. This allows developers to focus more on writing business logic and less on configuring the underlying infrastructure.
Essentially, Spring Boot makes it easier and faster to build and deploy Spring-based applications, making it a popular choice for developing microservices, web applications, and cloud-native applications.
Hevo Data streamlines batch processing by automating data extraction, transformation, and loading. Effortlessly move large datasets from various sources to your data warehouse or data lake. Hevo’s no-code platform empowers teams to:
- Integrate data from 150+ sources(60+ free sources).
- Simplify data mapping and transformations using features like drag-and-drop.
- Easily migrate different data types like CSV, JSON, etc., with the auto-mapping feature.
Experience Hevo and see why 2000+ data professionals including customers, such as Thoughtspot, Postman, and many more, have rated us 4.3/5 on G2.
Get Started with Hevo for Free
You can follow the step-by-step configuration guide to configure the Spring Boot:
1) Batch Processing in Spring Boot: Pom.xml
Spring boot batch and database dependencies are listed in the pom.xml file. A database is required by the spring boot batch to store batch-related information. Spring boot batch dependencies will provide all required classes for batch execution.
All jars relevant to spring boot batch will be included in the dependency spring-boot-starter-batch. The spring boot application will include the h2 database driver due to the h2 database dependency. You can take the following reference:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.5.2</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.yawintutor</groupId>
<artifactId>SpringBootBatch2</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>SpringBootBatch2</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>11</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
- Defines a Maven project for a Spring Boot application with batch processing capabilities.
- Sets
spring-boot-starter-parent
as the parent, inheriting Spring Boot’s core configuration.
- Declares dependencies for Spring Batch, testing utilities, and an embedded H2 database.
- Sets the Java version to 11 for compatibility with the codebase.
- Configures the
spring-boot-maven-plugin
to package and run the application easily with Maven commands.
2) Spring Boot Main class
The default spring boot main class is used to begin the spring boot batch. There are no additional annotations or configurations on the Main class. You can consider the below class.
package com.yawintutor;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class SpringBootBatch2Application {
public static void main(String[] args) {
SpringApplication.run(SpringBootBatch2Application.class, args);
}
}
- Defines a Spring Boot application class
SpringBootBatch2Application
.
- The
@SpringBootApplication
annotation enables Spring Boot’s auto-configuration and component scanning.
- The
main
method is the entry point to run the application.
SpringApplication.run()
starts the Spring Boot application context and launches the app.
3) Application properties
In the configuration, two more application properties should be added.
- The database url, which will be used to connect to the database, will be included in the first property.
- The spring boot batch’s second property will enable it to create batch tables while the application is running.
spring.datasource.url=jdbc:h2:file:./DB
spring.batch.initialize-schema=ALWAYS
4) Start Spring Boot Application
After completing all of the preceding considerations and changes, you can now launch the spring boot application.
It will now function normally. However, if there are any database errors, they should be resolved before proceeding.
5) ItemReader Implementation
The ItemReader interface is used to define the reader class. This class should include the data reading code. The spring boot batch will read the data using the read method.
You can go through the following code snippet for defining the reader class.
package com.yawintutor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.NonTransientResourceException;
import org.springframework.batch.item.ParseException;
import org.springframework.batch.item.UnexpectedInputException;
public class MyCustomReader implements ItemReader<String>{
private String[] stringArray = { "Zero", "One", "Two", "Three", "Four", "Five" };
private int index = 0;
@Override
public String read() throws Exception, UnexpectedInputException,
ParseException, NonTransientResourceException {
if (index >= stringArray.length) {
return null;
}
String data = index + " " + stringArray[index];
index++;
System.out.println("MyCustomReader : Reading data : "+ data);
return data;
}
}
MyCustomReader
implements the ItemReader<String>
interface to read data for batch processing.
- It defines a
stringArray
containing values (“Zero”, “One”, etc.) and an index
to track the current position.
- The
read()
method returns a concatenated string of the index and the value from stringArray
.
- The method prints the data being read and increments the index until all elements are read, returning
null
when the end is reached.
6) ItemProcessor Implementation
You can define the data processing class using the ItemProcessor interface. This class should contain the data processing code. To process the data, the Spring Boot batch will invoke the process method.
You can refer to the following data processing class in the given code snippet.
package com.yawintutor;
import org.springframework.batch.item.ItemProcessor;
public class MyCustomProcessor implements ItemProcessor<String, String> {
@Override
public String process(String data) throws Exception {
System.out.println("MyCustomProcessor : Processing data : "+data);
data = data.toUpperCase();
return data;
}
}
MyCustomProcessor
implements the ItemProcessor<String, String>
interface to process data in batch jobs.
- The
process()
method takes an input string (data
), prints the processing step, and converts it to uppercase.
- The processed data is then returned as the uppercase version of the input string.
7) ItemWriter Implementation
The ItemWriter interface is used to define the writer class. This class should contain the code for writing the data after it has been processed. The spring boot batch will write the data using the write method.
You can follow the given writer class in the code snippet.
package com.yawintutor;
import java.util.List;
import org.springframework.batch.item.ItemWriter;
public class MyCustomWriter implements ItemWriter<String> {
@Override
public void write(List<? extends String> list) throws Exception {
for (String data : list) {
System.out.println("MyCustomWriter : Writing data : " + data);
}
System.out.println("MyCustomWriter : Writing data : completed");
}
}
MyCustomWriter
implements the ItemWriter<String>
interface to handle writing data in batch jobs.
- The
write()
method takes a list of strings (list
) as input and iterates through each item.
- For each item in the list, it prints a message indicating the data is being written.
- After writing all the items, it prints a message indicating the writing process is complete.
8) Spring Boot Batch Configurations
The spring boot batch configuration file specifies the batch job and batch steps. The JobBuilderFactory class generates a batch task. The StepBuilderFactory class generates a batch step.
The batch steps will be executed by the batch job. Batch jobs such as ItemReader, ItemProcessor, and ItemWriter will be defined in the batch step. The spring boot batch configuration defines how the batch should be run.
You can refer to the following code snippet:
package com.yawintutor;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
@EnableBatchProcessing
public class BatchConfig {
@Autowired
public JobBuilderFactory jobBuilderFactory;
@Autowired
public StepBuilderFactory stepBuilderFactory;
@Bean
public Job createJob() {
return jobBuilderFactory.get("MyJob")
.incrementer(new RunIdIncrementer())
.flow(createStep()).end().build();
}
@Bean
public Step createStep() {
return stepBuilderFactory.get("MyStep")
.<String, String> chunk(1)
.reader(new MyCustomReader())
.processor(new MyCustomProcessor())
.writer(new MyCustomWriter())
.build();
}
}
BatchConfig
is a Spring Batch configuration class, annotated with @Configuration
and @EnableBatchProcessing
to enable batch processing.
- It defines two beans:
createJob()
and createStep()
to set up a batch job and step.
createJob()
creates a job named “MyJob” and specifies it should increment the job’s run ID and run a step.
createStep()
creates a step “MyStep” that reads, processes, and writes data in chunks of size 1 using custom reader, processor, and writer.
9) Spring Boot Batch Schedulers
The spring boot batch schedulers are executed automatically, invoking the spring boot batch tasks. The JobLauncher class will run the spring boot batch. A spring boot scheduler is used in this example to launch the spring boot batch at regular intervals.
package com.yawintutor;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;
@Configuration
@EnableScheduling
public class SchedulerConfig {
@Autowired
JobLauncher jobLauncher;
@Autowired
Job job;
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.S");
@Scheduled(fixedDelay = 5000, initialDelay = 5000)
public void scheduleByFixedRate() throws Exception {
System.out.println("Batch job starting");
JobParameters jobParameters = new JobParametersBuilder()
.addString("time", format.format(Calendar.getInstance().getTime())).toJobParameters();
jobLauncher.run(job, jobParameters);
System.out.println("Batch job executed successfullyn");
}
}
SchedulerConfig
is a configuration class that enables scheduling with the @EnableScheduling
annotation.
- The
JobLauncher
and Job
are autowired to launch and execute the batch job.
- A
SimpleDateFormat
is used to format the current timestamp to pass as a parameter to the batch job.
- The
@Scheduled
annotation schedules the job to run every 5 seconds with an initial delay of 5 seconds.
- The
scheduleByFixedRate()
method runs the job with a dynamic timestamp as a job parameter each time it is executed.
10) Batch Processing in Spring Boot: Start Spring Boot Application
The spring boot batch configuration is complete. Now, you can start the spring boot batch application. The scheduler will begin the spring boot batch and complete all of the steps and tasks.
The log will look something as given below. Logs to read, process, and write tasks are displayed in the log.
This process is repeated several times until the data is ready for processing. When the data is finished, the batch will stop.
2021-07-22 23:00:55.897 INFO 39139 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=MyJob]] launched with the following parameters: [{run.id=3, time=2021-07-22 16:22:57.293}]
2021-07-22 23:00:55.927 INFO 39139 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [MyStep]
MyCustomReader : Reading data : 0 Zero
MyCustomProcessor : Processing data : 0 Zero
MyCustomWriter : Writing data : 0 ZERO
MyCustomWriter : Writing data : completed
MyCustomReader : Reading data : 1 One
MyCustomProcessor : Processing data : 1 One
MyCustomWriter : Writing data : 1 ONE
MyCustomWriter : Writing data : completed
MyCustomReader : Reading data : 2 Two
MyCustomProcessor : Processing data : 2 Two
MyCustomWriter : Writing data : 2 TWO
MyCustomWriter : Writing data : completed
MyCustomReader : Reading data : 3 Three
MyCustomProcessor : Processing data : 3 Three
MyCustomWriter : Writing data : 3 THREE
MyCustomWriter : Writing data : completed
MyCustomReader : Reading data : 4 Four
MyCustomProcessor : Processing data : 4 Four
MyCustomWriter : Writing data : 4 FOUR
MyCustomWriter : Writing data : completed
MyCustomReader : Reading data : 5 Five
MyCustomProcessor : Processing data : 5 Five
MyCustomWriter : Writing data : 5 FIVE
MyCustomWriter : Writing data : completed
2021-07-22 23:00:55.954 INFO 39139 --- [ main] o.s.batch.core.step.AbstractStep : Step: [MyStep] executed in 27ms
2021-07-22 23:00:55.958 INFO 39139 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=MyJob]] completed with the following parameters: [{run.id=3, time=2021-07-22 16:22:57.293}] and the following status: [COMPLETED] in 42ms
- The batch job
MyJob
is launched with the parameters run.id=3
and time=2021-07-22 16:22:57.293
.
- The job executes the step
MyStep
, where data is read, processed, and written using custom reader, processor, and writer components.
- Each piece of data (e.g., “Zero”, “One”, etc.) is processed to uppercase and then written to the output.
- The processing and writing of data for each item (like “Zero”, “One”, etc.) is logged, showing the transformation.
- The job completes successfully with a status of
COMPLETED
in 42ms, after executing MyStep
in 27ms.
BONUS PROJECT: Let’s Create a Basic Application Using Batch Processing in Spring Boot
We’ll create a job that reads a CSV file, transforms it with a custom processor, and stores the final results in an in-memory database.
The jobs will be based on importing data from a Coffee list. The steps to be undergone to create the application using Batch Processing in Spring Boot are as follows:
Step 1: Batch Processing in Spring Boot: Maven Dependencies
To create a basic batch-driven application using Spring Boot, you need to first add the spring-boot-starter-batch to your pom.xml file.
You can go through the following code snippet:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
<version>2.4.0</version>
</dependency>
You also need to add the org.hsqldb dependency. For this, you can go through the following code snippet:
<dependency>
<groupId>org.hsqldb</groupId>
<artifactId>hsqldb</artifactId>
<version>2.5.1</version>
<scope>runtime</scope>
</dependency>
Step 2: Defining a Spring Batch Job
- First, you need to define the entry point of your application. For this, you can follow the below code snippet.
@SpringBootApplication
public class SpringBootBatchProcessingApplication {
public static void main(String[] args) {
SpringApplication.run(SpringBootBatchProcessingApplication.class, args);
}
}
- Define the application configuration properties in the src/main/resources/application.properties file.
file.input=coffee-list.csv
This file is a flat CSV file. Hence, Spring can handle it without any additional modifications.
It contains the location of your input coffee list. And each line in the list contains different characteristics of the coffee such as brand, origin, etc. The format of the list:
Blue Mountain,Jamaica,Fruity
Lavazza,Colombia,Strong
Folgers,America,Smokey
- Now, you need to add an SQL script to create a table named “coffee” to store the data. The name of the SQL script is schema-all.sql.
DROP TABLE coffee IF EXISTS;
CREATE TABLE coffee (
coffee_id BIGINT IDENTITY NOT NULL PRIMARY KEY,
brand VARCHAR(20),
origin VARCHAR(20),
characteristics VARCHAR(30)
);
Now, Spring Boot will automatically run the script whenever you begin.
- Further, you will create a domain class to hold the items of Coffee. Here, the Coffee class has three properties: brand, origin, and characteristics.
public class Coffee {
private String brand;
private String origin;
private String characteristics;
public Coffee(String brand, String origin, String characteristics) {
this.brand = brand;
this.origin = origin;
this.characteristics = characteristics;
}
// getters and setters
}
Step 3: Job Configuration
- You begin with a standard Spring @Configuration class in this step.
- After that, you can annotate your class with @EnableBatchProcessing. This annotation provides access to a variety of useful beans that help with jobs, saving time. It will also give you access to some useful factories that will come in handy when creating job configurations and job steps.
- You can include a reference to the previously declared file.input property in the final section of the initial configuration.
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
@Autowired
public JobBuilderFactory jobBuilderFactory;
@Autowired
public StepBuilderFactory stepBuilderFactory;
@Value("${file.input}")
private String fileInput;
// ...
}
- You need to define a reader bean in the configuration. You can take the following reference, where the reader bean searches for a file named coffee-list.csv and parses each line into an object named “Coffee”.
@Bean
public FlatFileItemReader reader() {
return new FlatFileItemReaderBuilder().name("coffeeItemReader")
.resource(new ClassPathResource(fileInput))
.delimited()
.names(new String[] { "brand", "origin", "characteristics" })
.fieldSetMapper(new BeanWrapperFieldSetMapper() {{
setTargetType(Coffee.class);
}})
.build();
}
@Bean
public JdbcBatchItemWriter writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO coffee (brand, origin, characteristics) VALUES (:brand, :origin, :characteristics)")
.dataSource(dataSource)
.build();
}
- Now, you need to add the actual job steps and configuration as given below.
@Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.flow(step1)
.end()
.build();
}
@Bean
public Step step1(JdbcBatchItemWriter writer) {
return stepBuilderFactory.get("step1")
.<Coffee, Coffee> chunk(10)
.reader(reader())
.processor(processor())
.writer(writer)
.build();
}
@Bean
public CoffeeItemProcessor processor() {
return new CoffeeItemProcessor();
}
The step is first configured to write up to ten records at a time using the chunk(10) declaration. The coffee data can then be read using the reader bean, which is configured using the reader method.
Following that, you send each coffee item to a custom processor where you can apply some custom business logic. Finally, you use the writer to add each coffee item to the database.
The job definition is contained in importUserJob. It contains an id generated by the RunIDIncrementer class. You’ve also added a JobCompletionNotificationListener to be notified when the job gets completed.
Step 4: The Custom Coffee Processor
The custom processor defined in the previous step looks something like this.
public class CoffeeItemProcessor implements ItemProcessor<Coffee, Coffee> {
private static final Logger LOGGER = LoggerFactory.getLogger(CoffeeItemProcessor.class);
@Override
public Coffee process(final Coffee coffee) throws Exception {
String brand = coffee.getBrand().toUpperCase();
String origin = coffee.getOrigin().toUpperCase();
String chracteristics = coffee.getCharacteristics().toUpperCase();
Coffee transformedCoffee = new Coffee(brand, origin, chracteristics);
LOGGER.info("Converting ( {} ) into ( {} )", coffee, transformedCoffee);
return transformedCoffee;
}
}
The ItemProcessor interface allows you to apply specific business logic during job execution. The CoffeeItemProcessor class is defined, which accepts a Coffee object as input and converts all of its properties to uppercase.
Step 5: Job Completion
You can write a JobCompletionNotificationListermer that can provide some feedback when the job is finished. You can refer to the following code snippet.
@Override
public void afterJob(JobExecution jobExecution) {
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
LOGGER.info("!!! JOB FINISHED! Time to verify the results");
String query = "SELECT brand, origin, characteristics FROM coffee";
jdbcTemplate.query(query, (rs, row) -> new Coffee(rs.getString(1), rs.getString(2), rs.getString(3)))
.forEach(coffee -> LOGGER.info("Found < {} > in the database.", coffee));
}
}
Step 6: Running the Job
After completing everything, now you can run the job. Your job ran successfully and each of the coffee items is stored in the database.
...
17:41:16.336 [main] INFO c.b.b.JobCompletionNotificationListener -
!!! JOB FINISHED! Time to verify the results
17:41:16.336 [main] INFO c.b.b.JobCompletionNotificationListener -
Found < Coffee [brand=BLUE MOUNTAIN, origin=JAMAICA, characteristics=FRUITY] > in the database.
17:41:16.337 [main] INFO c.b.b.JobCompletionNotificationListener -
Found < Coffee [brand=LAVAZZA, origin=COLOMBIA, characteristics=STRONG] > in the database.
17:41:16.337 [main] INFO c.b.b.JobCompletionNotificationListener -
Found < Coffee [brand=FOLGERS, origin=AMERICA, characteristics=SMOKEY] > in the database.
…
Before wrapping up, let’s cover some basics.
What is the Need for Spring Boot Batch Configuration?
- Spring boot batch processing is the automated processing of large amounts of data without the need for human intervention thus making it simpler and easier.
- Further, the Spring boot batch generates a job that includes steps. Each stage will include tasks such as reading, processing, and writing. The spring boot batch chunk aids in the configuration of the execution.
- Spring Boot Batch includes reusable functions such as logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management that are necessary when processing large volumes of records.
- It also provides more advanced technical services and features that, through optimization and partitioning techniques, will enable extremely high-volume and high-performance batch jobs.
- Simple as well as complex, high-volume batch jobs can use the framework to process large amounts of data in a highly scalable manner.
Conclusion
In this article, you have learned about Batch Processing in Spring Boot. This article also provided information on Batch Processing, its key features, the Spring boot framework, its key features, and the need for Spring Boot Batch configuration.
It also provides a step-by-step guide to configuring the Spring Boot in Batch Processing and also the creation of an application using Batch Processing in Spring Boot. Effortlessly move large datasets from various sources to your data warehouse or data lake. Sign up for a 14-day free trial and let Hevo handle the complexity, allowing you to focus on data analysis and insights.
FAQs
1. What is Spring Batch processing?
Spring Batch is a framework for building robust, scalable batch processing applications, including reading, processing, and writing large volumes of data.
2. How to run a batch file in Spring Boot?
To run a batch file in Spring Boot, define a Job
and a JobLauncher
bean, then use CommandLineRunner
to trigger the job during application startup.
3. What is the batch processing process?
Batch processing involves executing tasks or jobs in bulk without user interaction, typically in sequential steps: input data, processing logic, and output results.
Manisha Jena is a data analyst with over three years of experience in the data industry and is well-versed with advanced data tools such as Snowflake, Looker Studio, and Google BigQuery. She is an alumna of NIT Rourkela and excels in extracting critical insights from complex databases and enhancing data visualization through comprehensive dashboards. Manisha has authored over a hundred articles on diverse topics related to data engineering, and loves breaking down complex topics to help data practitioners solve their doubts related to data engineering.