Because of the increase in data collection, organisations rely on bulk processing to effectively handle large amounts of data. It includes automated operations as well as the complex processing of large datasets without the need for user interaction. As a result, organisations use batch processing frameworks such as Spring Batch to simplify the overall workflow and handle billions of events every day.
Spring Batch Processing can aid in the development of robust batch processing applications and even the management of event-driven operations via Spring Batch Jobs.
In this article, you will gain information about Spring Batch Jobs. You will also gain a holistic understanding of Batch Processing, its key features, Spring framework, Spring Batch, and a step-by-step guide to developing a Spring Boot application through Spring Batch Jobs. Read along to find out in-depth information about Spring Batch Jobs.
Prerequisites
- Installation of JDK (Java Development Kit)
- Basic Knowledge in Spring Boot
What is Batch Processing?
Batch processing is an efficient way of running a large number of iterative data jobs. With the right amount of computing resources present, the batch method allows you to process data with little to no user interaction. After you’ve collected and saved your data, you can use the batch processing method to process it during an event known as a “batch window.” It offers an efficient workflow layout by prioritizing processing tasks and completing data jobs when it makes the most sense. Read about the fundamental concepts and terms used by Spring Batch Parallel Processing.
Key Features of Batch Processing
Batch Processing has become popular due to its numerous benefits for enterprise data management. It has several advantages for businesses:
- Efficiency: Batch Processing allows a company to process jobs when computing or other resources are readily available. Companies can schedule batch processes for non-urgent tasks and prioritize time-sensitive tasks. Batch systems can also run in the background in order to reduce processor stress.
- Simplicity: When compared to Stream Processing, Batch Processing is a less complex system that does not require any special hardware or system support. It requires less maintenance for data input.
- Faster Business Intelligence: Batch Processing enables businesses to quickly process large volumes of data which results in faster and efficient Business Intelligence. In Batch processing, many records can be processed at once which reduces the processing time and ensures that data is delivered on time. Additionally, since numerous jobs can be handled simultaneously, Business intelligence is available faster than earlier.
- Improved Data Quality: By automating most or all components of a processing job and minimizing user interaction, batch processing reduces the likelihood of errors. Precision and accuracy are improved to achieve a higher level of data quality.
Compare batch processing vs. stream processing and pick the right approach for your data needs.
Hevo Data, a No-code Data Pipeline, helps integrate data from various databases with 150+ other sources and load it in a data warehouse of your choice. It provides a consistent & reliable solution to manage data in real-time and always has analysis-ready data in your desired destination. Check out what makes Hevo amazing:
- Load Events in Batches: Events can be loaded in batches in certain data warehouses.
- Easy Integration: Connect and migrate data without any coding.
- Auto-Schema Mapping: Automatically map schemas to ensure smooth data transfer.
- In-Built Transformations: Transform your data on the fly with Hevo’s powerful transformation capabilities.
Get Started with Hevo for Free
What is Spring Framework?
The Spring Framework is a free and open-source application framework that provides infrastructure support for Java developers. Spring is one of the most popular Java Enterprise Edition (Java EE) frameworks that assist developers in creating high-performance applications using simple Java objects (POJOs). It offers a comprehensive programming and configuration model for modern Java-based enterprise applications running on any platform.
Spring’s infrastructure support at the application level is a key component: Spring focuses on the “plumbing” of enterprise applications so that teams can focus on the business logic based on business logic without being tethered to specific deployment environments.
What is Spring Batch?
Spring Batch is a lightweight, all-in-one batch framework that facilitates the development of dependable batch applications that are critical to enterprise systems’ day-to-day operations. Spring Batch includes reusable features such as Logging/Tracing, Transaction Management, Task Processing Statistics, Job Restart, and Resource Management, which are required when processing large amounts of data. It also encompasses more advanced technical features and solutions that will enable exceptionally high-performance batch jobs through optimization and partitioning approaches.
Spring Batch is the de facto standard for batch processing on the Java Virtual Machine (JVM). Its implementation of common batch patterns, such as chunk-based processing and partitioning, lets you create high-performing, scalable batch applications that are resilient enough for your most mission-critical processes. Spring Boot provides an additional level of production-grade features to let you speed up the development of your batch processes. Dive into Spring Batch’s scheduling capabilities to develop robust batch-processing applications.
Spring Batch Jobs: Developing a Spring Boot Application
In this example, a Spring Boot application is developed that reads data from a CSV file and stores it in an SQL Database.
The steps involved in developing a Spring Boot application in the process of setting up Spring Batch jobs are as follows:
1) Spring Batch Jobs: Application Setup
The steps are as follows:
- Step 1: On your browser, navigate to the Spring Intializr.
- Step 2: Set the name of your project as per your choice. You can name it “springbatch“.
- Step 3: You can add spring web, h2 database, lombok, spring batch, and spring data jpa as the project dependencies
.
- Step 4: A project zip file will be generated. Now, click on the “generate” button to download the file.
- Step 5: After downloading, you can decompress the downloaded file and open it in your IDE.
2) Spring Batch Jobs: Data Layer
The steps are as follows:
- Step 1: In the root project package, create a new package named domain.
- Step 2: Create a file named as Customer in the domain package you created. Then add the following code into it.
@Entity(name = "person")
@Getter // Lombok annotation to generate Getters for the fields
@Setter // Lombok annotation to generate Setters for the fields
@AllArgsConstructor // Lombok annotation to generate a constructor will all of the fields in the class
@NoArgsConstructor // Lombok annotation to generate an empty constructor for the class
@EntityListeners(AuditingEntityListener.class)
public class Customer {
@Id // Sets the id field as the primary key in the database table
@Column(name = "id") // sets the column name for the id property
@GeneratedValue(strategy = GenerationType.AUTO) // States that the id field should be autogenerated
private Long id;
@Column(name = "last_name")
private String lastName;
@Column(name = "first_name")
private String firstName;
// A method that returns firstName and Lastname when an object of the class is logged
@Override
public String toString() {
return "firstName: " + firstName + ", lastName: " + lastName;
}
}
The class above includes an id field for the database’s primary key, as well as lastName and firstName fields retrieved from the csv file used in this example (data.csv file).
3) Spring Batch Jobs: Repository Layer
The steps are as follows:
- Step 1: In the root project package, create a new package named repositories.
- Step 2: Create an interface named CustomerRepository in the previously created repositories package and add the following code.
// The interface extends JpaRepository that has the CRUD operation methods
public interface CustomerRepository extends JpaRepository<Customer, Long> {
}
4) Spring Batch Jobs: Processor
The steps are as follows:
- Step 1: In the root project package, create a new package named processor.
- Step 2: Create a new Java file named CustomerProcessor in the processor package. Then, add the following code into it.
public class CustomerProcessor implements ItemProcessor<Customer, Customer> {
// Creates a logger
private static final Logger logger = LoggerFactory.getLogger(CustomerProcessor.class);
// This method transforms data form one form to another.
@Override
public Customer process(final Customer customer) throws Exception {
final String firstName = customer.getFirstName().toUpperCase();
final String lastName = customer.getLastName().toUpperCase();
// Creates a new instance of Person
final Customer transformedCustomer = new Customer(1L, firstName, lastName);
// logs the person entity to the application logs
logger.info("Converting (" + customer + ") into (" + transformedCustomer + ")");
return transformedCustomer;
}
}
The class here converts data from one form to another. The ItemProcessor <I, O> accepts input data (I), transforms it, and returns the result as output data (O).
The Customer entity is declared as both the input and output in this case. As a result, the data form is preserved.
5) Spring Batch Jobs: Configuration Layer
The steps are as follows:
- Step 1: In the root project package, create a new package named config. All of our configurations will be included in this package.
- Step 2: Create a new Java file named BatchConfiguration in the config package. Then add the following code into it.
@Configuration // Informs Spring that this class contains configurations
@EnableBatchProcessing // Enables batch processing for the application
public class BatchConfiguration {
@Autowired
public JobBuilderFactory jobBuilderFactory;
@Autowired
public StepBuilderFactory stepBuilderFactory;
@Autowired
@Lazy
public CustomerRepository customerRepository;
// Reads the sample-data.csv file and creates instances of the Person entity for each person from the .csv file.
@Bean
public FlatFileItemReader<Customer> reader() {
return new FlatFileItemReaderBuilder<Customer>()
.name("customerReader")
.resource(new ClassPathResource("data.csv"))
.delimited()
.names(new String[]{"firstName", "lastName"})
.fieldSetMapper(new BeanWrapperFieldSetMapper<>() {{
setTargetType(Customer.class);
}})
.build();
}
// Creates the Writer, configuring the repository and the method that will be used to save the data into the database
@Bean
public RepositoryItemWriter<Customer> writer() {
RepositoryItemWriter<Customer> iwriter = new RepositoryItemWriter<>();
iwriter.setRepository(customerRepository);
iwriter.setMethodName("save");
return iwriter;
}
// Creates an instance of PersonProcessor that converts one data form to another. In our case the data form is maintained.
@Bean
public CustomerProcessor processor() {
return new CustomerProcessor();
}
// Batch jobs are built from steps. A step contains the reader, processor and the writer.
@Bean
public Step step1(ItemReader<Customer> itemReader, ItemWriter<Customer> itemWriter)
throws Exception {
return this.stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(5)
.reader(itemReader)
.processor(processor())
.writer(itemWriter)
.build();
}
// Executes the job, saving the data from .csv file into the database.
@Bean
public Job customerUpdateJob(JobCompletionNotificationListener listener, Step step1)
throws Exception {
return this.jobBuilderFactory.get("customerUpdateJob").incrementer(new RunIdIncrementer())
.listener(listener).start(step1).build();
}
}
- Step 3: Create a new Java class named JobCompletionNotificationListener in the config package and add the following code.
@Component
public class JobCompletionListener extends JobExecutionListenerSupport {
// Creates an instance of the logger
private static final Logger log = LoggerFactory.getLogger(JobCompletionListener.class);
private final CustomerRepository customerRepository;
@Autowired
public JobCompletionListener(CustomerRepository customerRepository) {
this.customerRepository = customerRepository;
}
// The callback method from the Spring Batch JobExecutionListenerSupport class that is executed when the batch process is completed
@Override
public void afterJob(JobExecution jobExecution) {
// When the batch process is completed the the users in the database are retrieved and logged on the application logs
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
log.info("!!! JOB COMPLETED! verify the results");
customerRepository.findAll()
.forEach(person -> log.info("Found (" + person + ">) in the database.") );
}
}
}
6) Spring Batch Jobs: Controller Layer
The steps are as follows:
- Step 1: In the root project package, create a new package named controllers.
- Step 2: Create a Java class named BatchController in the previously created controllers package and add the following code.
@RestController
@RequestMapping(path = "/batch")// Root path
public class BatchController {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job job;
// The function below accepts a GET request to invoke the Batch Process and returns a String as response with the message "Batch Process started!!".
@GetMapping(path = "/start") // Start batch process path
public ResponseEntity<String> startBatch() {
JobParameters Parameters = new JobParametersBuilder()
.addLong("startAt", System.currentTimeMillis()).toJobParameters();
try {
jobLauncher.run(job, Parameters);
} catch (JobExecutionAlreadyRunningException | JobRestartException
| JobInstanceAlreadyCompleteException | JobParametersInvalidException e) {
e.printStackTrace();
}
return new ResponseEntity<>("Batch Process started!!", HttpStatus.OK);
}
}
7) Spring Batch Jobs: Application Configuration
The steps are as follows:
- Step 1: Add the following code to the application.properties file in the resource directory.
# Sets the server port from where we can access our application
server.port=8080
# Disables our batch process from automatically running on application startup
spring.batch.job.enabled=false
8) Spring Batch Jobs: Testing
The steps are as follows:
- Step 1: To begin the batch process, open Postman and send a GET request to http://localhost:8080/batch/start.
- Step 2: From the application logs you can know that the batch process starts running after you send the GET request.
Conclusion
In this article, you have learned about Spring Batch Jobs. This article also provided information on Batch Processing, its key features, Spring framework, Spring Batch, and a step-by-step guide to developing a Spring Boot application through Spring Batch Jobs.
Hevo Data, a No-code Data Pipeline, provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.
Hevo Data, with its strong integration with 150+ Data Sources (including 60+ Free Sources), allows you to not only export data from your desired data sources & load it to the destination of your choice but also transform & enrich your data to make it analysis-ready.
Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.
Share your experience of understanding Spring Batch Jobs in the comment section below! We would love to hear your thoughts.
FAQs
What is the difference between a step and a job in Spring Batch?
In Spring Batch, a job is the overall task or process that you want to accomplish, while a step is a smaller, specific part of that job. Each job is made up of multiple steps, with each step handling a specific function or task within the job.
What are Spring Batch jobs?
Spring Batch jobs are automated, repeatable tasks or processes often used for data processing tasks like reading, transforming, and writing data from one source to another (e.g., from a database to a file).
What is an example of a batch job?
An example of a batch job is a payroll processing job, which reads employee data, calculates salaries, and writes the results to a payroll file.
Manisha Jena is a data analyst with over three years of experience in the data industry and is well-versed with advanced data tools such as Snowflake, Looker Studio, and Google BigQuery. She is an alumna of NIT Rourkela and excels in extracting critical insights from complex databases and enhancing data visualization through comprehensive dashboards. Manisha has authored over a hundred articles on diverse topics related to data engineering, and loves breaking down complex topics to help data practitioners solve their doubts related to data engineering.