Because of the increase in data collection, organisations rely on bulk processing to effectively handle large amounts of data. It includes automated operations as well as the complex processing of large datasets without the need for user interaction. As a result, organisations use batch processing frameworks such as Spring Batch to simplify the overall workflow and handle billions of events every day.
Spring Batch Processing can aid in the development of robust batch processing applications and even the management of event-driven operations via Spring Batch Jobs.
In this article, you will gain information about Spring Batch Jobs. You will also gain a holistic understanding of Batch Processing, its key features, Spring framework, Spring Batch, and a step-by-step guide to developing a Spring Boot application through Spring Batch Jobs. Read along to find out in-depth information about Spring Batch Jobs.
Prerequisites
- Installation of JDK (Java Development Kit)
- Basic Knowledge in Spring Boot
What is Batch Processing?
Herman Hollerith, an American inventor who invented the first tabulating machine, used the Batch Processing method for the first time in the 19th century. This device, which was capable of counting and sorting data organized on punched cards, became the forerunner of the modern computer. The cards, as well as the information on them, could then be collected and processed in batches. Large amounts of data could be processed more quickly and accurately with this innovation than with manual entry methods.
Batch processing is an efficient way of running a large number of iterative data jobs. With the right amount of computing resources present, the batch method allows you to process data with little to no user interaction.
After you’ve collected and saved your data, you can use the batch processing method to process it during an event known as a “batch window.” It offers an efficient workflow layout by prioritizing processing tasks and completing data jobs when it makes the most sense.
Batch Processing is a method for processing large amounts of data in a consistent manner. When computing resources are available, the batch method allows users to process data with little or no user interaction.
The Batch Reference Architecture is depicted in a simplified form in the diagram below. A Batch Operation is usually encapsulated by a Job, which is made up of several Steps. Each Step typically employs a single ItemReader, ItemProcessor, and ItemWriter. A JobLauncher executes a job, and metadata about the jobs that have been configured and executed is saved in a JobRepository. It defines the fundamental concepts and terms used by Spring Batch Parallel Processing.
Key Features of Batch Processing
Batch Processing has become popular due to its numerous benefits for enterprise data management. It has several advantages for businesses:
- Efficiency: Batch Processing allows a company to process jobs when computing or other resources are readily available. Companies can schedule batch processes for non-urgent tasks and prioritize time-sensitive tasks. Batch systems can also run in the background in order to reduce processor stress.
- Simplicity: When compared to Stream Processing, Batch Processing is a less complex system that does not require any special hardware or system support. It requires less maintenance for data input.
- Faster Business Intelligence: Batch Processing enables businesses to quickly process large volumes of data which results in faster and efficient Business Intelligence. In Batch processing, many records can be processed at once which reduces the processing time and ensures that data is delivered on time. Additionally, since numerous jobs can be handled simultaneously, Business intelligence is available faster than earlier.
- Improved Data Quality: By automating most or all components of a processing job and minimizing user interaction, batch processing reduces the likelihood of errors. Precision and accuracy are improved to achieve a higher level of data quality.
What is Spring Framework?
The Spring Framework is a free and open-source application framework that provides infrastructure support for Java developers. Spring is one of the most popular Java Enterprise Edition (Java EE) frameworks that assist developers in creating high-performance applications using simple Java objects (POJOs). It offers a comprehensive programming and configuration model for modern Java-based enterprise applications running on any platform.
Spring’s infrastructure support at the application level is a key component: Spring focuses on the “plumbing” of enterprise applications so that teams can focus on the business logic based on business logic without being tethered to specific deployment environments.
What is Spring Batch?
Spring Batch is a lightweight, all-in-one batch framework that facilitates the development of dependable batch applications that are critical to enterprise systems’ day-to-day operations. Spring Batch includes reusable features including Logging/Tracing, Transaction Management, Task Processing Statistics, Job Restart, and Resource Management, which are required when processing large amounts of data. It also encompasses more advanced technical features and solutions that will enable exceptionally high-performance batch jobs through optimization and partitioning approaches.
Spring Batch is the de facto standard for batch processing on the Java Virtual Machine (JVM). Its implementation of common batch patterns, such as chunk-based processing and partitioning, lets you create high-performing, scalable batch applications that are resilient enough for your most mission-critical processes. Spring Boot provides an additional level of production-grade features to let you speed up the development of your batch processes.
Spring Batch Jobs: Developing a Spring Boot Application
In this example, a Spring Boot application is developed that reads data from a CSV file and stores it in an SQL Database.
The steps involved in developing a Spring Boot application in the process of setting up Spring Batch jobs are as follows:
1) Spring Batch Jobs: Application Setup
The steps are as follows:
- Step 1: On your browser, navigate to the Spring Intializr.
- Step 2: Set the name of your project as per your choice. You can name it “springbatch“.
- Step 3: You can add spring web, h2 database, lombok, spring batch, and spring data jpa as the project dependencies
.
- Step 4: A project zip file will be generated. Now, click on the “generate” button to download the file.
- Step 5: After downloading, you can decompress the downloaded file and open it in your IDE.
2) Spring Batch Jobs: Data Layer
The steps are as follows:
- Step 1: In the root project package, create a new package named domain.
- Step 2: Create a file named as Customer in the domain package you created. Then add the following code into it.
@Entity(name = "person")
@Getter // Lombok annotation to generate Getters for the fields
@Setter // Lombok annotation to generate Setters for the fields
@AllArgsConstructor // Lombok annotation to generate a constructor will all of the fields in the class
@NoArgsConstructor // Lombok annotation to generate an empty constructor for the class
@EntityListeners(AuditingEntityListener.class)
public class Customer {
@Id // Sets the id field as the primary key in the database table
@Column(name = "id") // sets the column name for the id property
@GeneratedValue(strategy = GenerationType.AUTO) // States that the id field should be autogenerated
private Long id;
@Column(name = "last_name")
private String lastName;
@Column(name = "first_name")
private String firstName;
// A method that returns firstName and Lastname when an object of the class is logged
@Override
public String toString() {
return "firstName: " + firstName + ", lastName: " + lastName;
}
}
The class above includes an id field for the database’s primary key, as well as lastName and firstName fields retrieved from the csv file used in this example (data.csv file).
3) Spring Batch Jobs: Repository Layer
The steps are as follows:
- Step 1: In the root project package, create a new package named repositories.
- Step 2: Create an interface named CustomerRepository in the previously created repositories package and add the following code.
// The interface extends JpaRepository that has the CRUD operation methods
public interface CustomerRepository extends JpaRepository<Customer, Long> {
}
4) Spring Batch Jobs: Processor
The steps are as follows:
- Step 1: In the root project package, create a new package named processor.
- Step 2: Create a new Java file named CustomerProcessor in the processor package. Then, add the following code into it.
public class CustomerProcessor implements ItemProcessor<Customer, Customer> {
// Creates a logger
private static final Logger logger = LoggerFactory.getLogger(CustomerProcessor.class);
// This method transforms data form one form to another.
@Override
public Customer process(final Customer customer) throws Exception {
final String firstName = customer.getFirstName().toUpperCase();
final String lastName = customer.getLastName().toUpperCase();
// Creates a new instance of Person
final Customer transformedCustomer = new Customer(1L, firstName, lastName);
// logs the person entity to the application logs
logger.info("Converting (" + customer + ") into (" + transformedCustomer + ")");
return transformedCustomer;
}
}
The class here converts data from one form to another. The ItemProcessor <I, O> accepts input data (I), transforms it, and returns the result as output data (O).
The Customer entity is declared as both the input and output in this case. As a result, the data form is preserved.
5) Spring Batch Jobs: Configuration Layer
The steps are as follows:
- Step 1: In the root project package, create a new package named config. All of our configurations will be included in this package.
- Step 2: Create a new Java file named BatchConfiguration in the config package. Then add the following code into it.
@Configuration // Informs Spring that this class contains configurations
@EnableBatchProcessing // Enables batch processing for the application
public class BatchConfiguration {
@Autowired
public JobBuilderFactory jobBuilderFactory;
@Autowired
public StepBuilderFactory stepBuilderFactory;
@Autowired
@Lazy
public CustomerRepository customerRepository;
// Reads the sample-data.csv file and creates instances of the Person entity for each person from the .csv file.
@Bean
public FlatFileItemReader<Customer> reader() {
return new FlatFileItemReaderBuilder<Customer>()
.name("customerReader")
.resource(new ClassPathResource("data.csv"))
.delimited()
.names(new String[]{"firstName", "lastName"})
.fieldSetMapper(new BeanWrapperFieldSetMapper<>() {{
setTargetType(Customer.class);
}})
.build();
}
// Creates the Writer, configuring the repository and the method that will be used to save the data into the database
@Bean
public RepositoryItemWriter<Customer> writer() {
RepositoryItemWriter<Customer> iwriter = new RepositoryItemWriter<>();
iwriter.setRepository(customerRepository);
iwriter.setMethodName("save");
return iwriter;
}
// Creates an instance of PersonProcessor that converts one data form to another. In our case the data form is maintained.
@Bean
public CustomerProcessor processor() {
return new CustomerProcessor();
}
// Batch jobs are built from steps. A step contains the reader, processor and the writer.
@Bean
public Step step1(ItemReader<Customer> itemReader, ItemWriter<Customer> itemWriter)
throws Exception {
return this.stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(5)
.reader(itemReader)
.processor(processor())
.writer(itemWriter)
.build();
}
// Executes the job, saving the data from .csv file into the database.
@Bean
public Job customerUpdateJob(JobCompletionNotificationListener listener, Step step1)
throws Exception {
return this.jobBuilderFactory.get("customerUpdateJob").incrementer(new RunIdIncrementer())
.listener(listener).start(step1).build();
}
}
- Step 3: Create a new Java class named JobCompletionNotificationListener in the config package and add the following code.
@Component
public class JobCompletionListener extends JobExecutionListenerSupport {
// Creates an instance of the logger
private static final Logger log = LoggerFactory.getLogger(JobCompletionListener.class);
private final CustomerRepository customerRepository;
@Autowired
public JobCompletionListener(CustomerRepository customerRepository) {
this.customerRepository = customerRepository;
}
// The callback method from the Spring Batch JobExecutionListenerSupport class that is executed when the batch process is completed
@Override
public void afterJob(JobExecution jobExecution) {
// When the batch process is completed the the users in the database are retrieved and logged on the application logs
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
log.info("!!! JOB COMPLETED! verify the results");
customerRepository.findAll()
.forEach(person -> log.info("Found (" + person + ">) in the database.") );
}
}
}
Processing data can be a mammoth task without the right set of tools. Hevo’s automated platform empowers you with everything you need to have a smooth Data Collection, Processing, and Replication experience. Our platform has the following in store for you!
- Exceptional Security: A Fault-tolerant Architecture that ensures consistency and robust security with Zero Data Loss.
- Built to Scale: Exceptional Horizontal Scalability with Minimal Latency for Modern-data Needs.
- Built-in Connectors: Support for 100+ Data Sources, including Databases, SaaS Platforms, Files & More. Native Webhooks & REST API Connector available for Custom Sources.
- Data Transformations: Best-in-class & Native Support for Complex Data Transformation at fingertips. Code & No-code Fexibilty designed for everyone.
- Smooth Schema Mapping: Fully-managed Automated Schema Management for incoming data with the desired destination.
- Blazing-fast Setup: Straightforward interface for new customers to work on, with minimal setup time.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
SIGN UP HERE FOR A 14-DAY FREE TRIAL
6) Spring Batch Jobs: Controller Layer
The steps are as follows:
- Step 1: In the root project package, create a new package named controllers.
- Step 2: Create a Java class named BatchController in the previously created controllers package and add the following code.
@RestController
@RequestMapping(path = "/batch")// Root path
public class BatchController {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job job;
// The function below accepts a GET request to invoke the Batch Process and returns a String as response with the message "Batch Process started!!".
@GetMapping(path = "/start") // Start batch process path
public ResponseEntity<String> startBatch() {
JobParameters Parameters = new JobParametersBuilder()
.addLong("startAt", System.currentTimeMillis()).toJobParameters();
try {
jobLauncher.run(job, Parameters);
} catch (JobExecutionAlreadyRunningException | JobRestartException
| JobInstanceAlreadyCompleteException | JobParametersInvalidException e) {
e.printStackTrace();
}
return new ResponseEntity<>("Batch Process started!!", HttpStatus.OK);
}
}
7) Spring Batch Jobs: Application Configuration
The steps are as follows:
- Step 1: Add the following code to the application.properties file in the resource directory.
# Sets the server port from where we can access our application
server.port=8080
# Disables our batch process from automatically running on application startup
spring.batch.job.enabled=false
8) Spring Batch Jobs: Testing
The steps are as follows:
- Step 1: To begin the batch process, open Postman and send a GET request to http://localhost:8080/batch/start.
- Step 2: From the application logs you can know that the batch process starts running after you send the GET request.
Conclusion
In this article, you have learned about Spring Batch Jobs. This article also provided information on Batch Processing, its key features, Spring framework, Spring Batch, and a step-by-step guide to developing a Spring Boot application through Spring Batch Jobs.
Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.
Visit our Website to Explore Hevo
Hevo Data with its strong integration with 100+ Data Sources (including 40+ Free Sources) allows you to not only export data from your desired data sources & load it to the destination of your choice but also transform & enrich your data to make it analysis-ready. Hevo also allows integrating data from non-native sources using Hevo’s in-built REST API & Webhooks Connector. You can then focus on your key business needs and perform insightful analysis using BI tools.
Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.
Share your experience of understanding Spring Batch Jobs in the comment section below! We would love to hear your thoughts.
Manisha Jena is a data analyst with over three years of experience in the data industry and is well-versed with advanced data tools such as Snowflake, Looker Studio, and Google BigQuery. She is an alumna of NIT Rourkela and excels in extracting critical insights from complex databases and enhancing data visualization through comprehensive dashboards. Manisha has authored over a hundred articles on diverse topics related to data engineering, and loves breaking down complex topics to help data practitioners solve their doubts related to data engineering.