Processes are usually run in two forms – Batch Processing and Stream Processing. Batch Processing is a proven solution for organizations requiring to run multiple jobs simultaneously without losing efficiency and ensuring better resource management.
Mulesoft is a solution that provides a reliable and flexible solution to connect and create APIs. This tool also houses support for Batch Processing which helps improves the performance of the connections.
This article provides a comprehensive guide on Batch Processing, Mulesoft, and steps to set up Mulesoft Batch processing.
What is Batch Processing?
Batch Processing is a methodology that was coined by Herman Hollerith in the 19th century. This was the time period when computers were first used to process operations in the form of punched cards. The machine that supported Batch Processing was able to perform operations on punch cards in form of batches. These batches enable the processing of large amounts of data swiftly and without losing accuracy relative to manual entry methods.
Batch Processing is a technique that is very effective when it comes to processing large amounts of data. It allows performing operations without the requirement of user interaction to start the process.
The users are required to collect the data and store it for Batch Processing. This stored entity will be processed during the “Batch Window” in form of batches, which are of fixed sizes. This allows for prioritizing the jobs and completing the jobs with efficiency.
The Batch Processing methodology proves to be a very efficient way of running a large number of jobs that are iterative.
Key Benefits of Batch Processing
- Efficiency: The Batch Processing allows the scheduling of jobs based on priority. This improves the overall efficiency and reduces the stress on the processor.
- Simplicity: Batch Processing is relatively simple when compared to Stream Processing as it does not require any special systems.
- Better Data Quality: Batch Processing reduces errors by minimizing the requirement of user intervention. Also, the accuracy and precision are improved.
- Swift Business Intelligence: Batch Processing allows the processing of large volumes of data, enabling faster Business Intelligence. Processing multiple records at once allows for reduced processing times.
Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s automated, No-code platform empowers you with everything you need to have for a smooth data replication experience.
Check out what makes Hevo amazing:
- Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
- Data Transformation: Hevo provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making.
- Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
- Scalable Infrastructure: Hevo has in-built integrations for 150+ sources (with 60+ free sources) that can help you scale your data infrastructure as required.
- Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!
What is Mulesoft?
Mulesoft is an integration platform that is used for connecting SaaS platforms and enterprise applications. Mulesoft was acquired by Salesforce in the year 2018 for seamless integrations & accelerated business decisions.
Mulesoft provides APIs to connect with various applications. When these APIs are placed over a system, it provides a way to interact with other systems without knowing the process of connection. The System API, which acts as an intermediary between applications on your Cloud platform and On-premise systems to understand system language, allows communication through a simple exchange of data.
Mulesoft also assigns different responsibilities to different APIs present in the system. Complex APIs deal with the extracted data and perform logical operations on it. The Experience APIs deal with formatting the extracted data to be compatible with different systems. Mulesoft is efficient in creating APIs and even use APIs for integration.
Key Features of Mulesoft
Mulesoft offers a feature-rich suite to tackle many use cases. A few key features are mentioned below.
- Faster Delivery: Mulesoft has prebuilt connectors and APIs that provide quicker start times for applications and better finish times. It also provides accelerators that aid in API development. The APIs can also be reused to reduce efforts in future development processes.
- Better Security: Mulesoft provides security governance and applies it on different levels of the APIs life cycle. The platform complies with all the security requirements, which makes it easier to create and deploy API securely.
- Reliable Operations: The Mulesoft platform provides reliable and scalable operations. It provides tools like monitoring and analytics which allows for making better decisions based on various performance metrics.
- Future Ready: The Mulesoft software solution is adaptable and flexible. This provides a plug-play architecture that improves versatility. This enables it to be future-ready for newer technologies and application developments.
The Architecture of Mulesoft Batch Processing
The Mulesoft Batch Processing works on the concept of Batch Jobs. Batch Jobs are a scope that splits large messages into smaller Mule Processes known as Records, which are run in the asynchronous pattern.
A Batch Job is similar to Flow, wherein the Batch works on records and Flow works on messages. The Batch Job comprises one or more Batch steps that provide information on the operations to be performed on the data.
After the records passed through the batch steps, the Batch Jobs instance reports the results determining the success or failure of the Batch and the steps of the process.
The Mulesoft Batch Processing Architecture has Two Main Components:
1) Data Splitting Mechanism
The Batch Jobs are split implicitly in Mulesoft Batch Processing. The Mule message recognizes aspects like Java Iterators, Arrays, JSON and XML Payloads, and a few other formats and splits them. Any other format is not splittable by Mule. You would need to transform any other format to the above-mentioned ones in order to split the record for Mulesoft Batch Processing.
2) Handling Errors
The Record-level failures are handled by Batch Jobs. Batch Jobs have the provision to add or remove variables from the Batch Steps or individual records while processing. This allows for handling errors arising due to variables.
What is the Difference between Batch Job vs Batch Instance?
Batch Jobs: It is a scope element of the Mulesoft application, where the message payload is processed as a job. The Batch Job consists of three phases: Load and Dispatch, Process, and On Complete.
Batch Job Instance: This is an occurrence of Mule application when a Batch Job is executed. Batch Job Instance is created in the Load and Dispatch phase of Batch Job and is represented with a string job instance id. This id helps in passing the local job to external systems.
What are the Phases of Mulesoft Batch Processing?
Mulesoft Batch Processing has three different phases:
- Data Loading and Data Dispatch.
- Record Processing.
- Batch Job Completion.
1. Data Loading and Data Dispatch
This phase of Mulesoft Batch Processing is invoked implicitly. The platform performs all the backend tasks required to create a Batch Job. In this phase, the Mule converts the serialized payload into the collection of records to be processed as a Batch. The two things Mule completes in this phase are:
- Split the message using the tool known as Dataweave. It also creates a job instance id which will be used for further processing.
- It creates a Queue and all the new job instances created are associated with it.
- The queue name is prefixed with BSQ, and you can see it in Anypoint Runtime Manager.
- As soon as this phase gets completed, the flow of execution is continuous and the process phase is invoked.
2. Record Processing
The process phase of Mulesoft Batch Processing is an asynchronous phase. In this phase, the Mule starts taking records from the queue to build batches of predefined block sizes. The Mules send these records to corresponding batch steps and the process starts asynchronously. The Batch Step runs multiple records in parallel after which those records are sent to the Stepping Queue, where the next batch step takes over. This process continues until every record passes through every Batch Step.
An aggregator is a tool that helps in changing the behaviour of the Queue.
The Behaviour of the aggregator can be configured for a fixed size. In this case, The processed records are sent to the aggregator, which buffers the records until the size is reached. After this records are sent to the Stepping Queue.
The behaviour of the aggregator can be configured for streaming. In this case, the records are sent to the aggregator and it processes all records for the current Batch Step and then moves to the Stepping Queue.
In this scenario, the Batch Step sends the processed records to an aggregator, which keeps processing records until all records from the current batch step are processed and aggregated.
Mule continues to process the batches and checks if the batch is successful or failed. It skips the failed batches and continues to the next ones. At the end of the Process phase, the Batch Job instance gets completed.
3. Batch Job Completion
This is the final phase where the runtime can be configured to create a report of records that are processed. This phase provides insights into records that have failed and help in addressing issues that might occur for input data.
<batch:job name="Batch3">
<batch:process-records>
<batch:step name="Step1">
<batch:record-variable-transformer/>
<ee:transform/>
</batch:step>
<batch:step name="Step2">
<logger/>
<http:request/>
</batch:step>
</batch:process-records>
<batch:on-complete>
<logger/>
</batch:on-complete>
</batch:job>
After all the Batch Jobs are executed, the result is stored as BatchJobResult. Since the results produced are asynchronous the results are not sent to a caller.
You have two options for working with the output:
- If you use DataWeave to create a report in the OnComplete phase, all the info like failed records, successful records, number of errors, cause of the error, and many more details can be added to the report.
- If you referent the BatchJobResult anywhere else in the Mule platform, information regarding the metadata, number of failed records, and a few others can be viewed.
- If the OnComplete phase is kept empty, It just completes and does not provide the information on whether the Batch Job has failed or was successful.
How to Trigger Batch Jobs in Mulesoft?
Batch Jobs can be triggered using:
- Batch Reference Message Processor: This processor can be used to reference a Batch Job within the Mule Flow in the same application.
- One-way Message Source: This can be used to create an inbound unidirectional message that can be placed at the beginning of the Batch Job.
Studio Visual Editor
XML Editor
One way message in the input phase, that is inbound to the system enables the Batch Job to trigger the start of Batch Processing. When it receives data from an external source or service, the message source initiates Batch Processing, beginning with any preparation you may have configured in the input phase. Refer to the example below, which leverages poll functionality to regularly fetch data from Salesforce.
<batch:job name="Batch2">
<batch:process-records>
<batch:step name="Step1">
<batch:record-variable-transformer/>
<data-mapper:transform/>
</batch:step>
<batch:step name="Step2">
<logger level="INFO" doc:name="Logger"/>
<http:request/>
</batch:step>
</batch:process-records>
<batch:on-complete>
<logger level="INFO" doc:name="Logger"/>
</batch:on-complete>
</batch:job>
<flow name="batchtest1Flow1">
<http:listener/>
<data-mapper:transform/>
<batch:execute name="Batch2"/>
</flow>
Studio Visual Editor
XML Editor
<batch:job name="Batch1">
<batch:input>
<poll>
<sfdc:authorize/>
</poll>
</batch:input>
<batch:process-records>
<batch:step name="Step1">
<batch:record-variable-transformer/>
<data-mapper:transform/>
</batch:step>
<batch:step name="Step2">
<logger/>
<http:request/>
</batch:step>
</batch:process-records>
<batch:on-complete>
<logger/>
</batch:on-complete>
</batch:job>
Strategy to Schedule Batch Jobs in Mulesoft
A Scheduling Strategy for Mulesoft Batch Processing helps in controlling the execution of Batch Jobs. There are many definitions of Batch Jobs and each Batch Job has its own Scheduling Strategy.
- ORDERED_SQUENTIA: This is the default strategy. In this case, if there are many jobs that are executable at the same time, the jobs are executed based on the timestamps.
- ROUND_ROBIN: This Strategy helps when there is a resource crunch and jobs need to be executed on priority. Round-robin helps in executing jobs to better use available resources.
Limitations of Mulesoft Batch Processing
- Mulesoft Batch Processing does not support the use of Business Events.
- Insight does not support visibility into Mulesoft Batch Processing.
- Mulesoft Batch Processing does not support job-instance-wide transactions.
Conclusion
Batch Processing helps in better resource management as well as efficient processing of jobs. The Batch is less complex and allows better results. Mulesoft is a solution that allows the organization to connect with different sources with ease. It also supports Batch Processing to run Batch Jobs with ease. This article gave a comprehensive overview of Batch processing, Mulesoft Batch Processing, and other aspects.
There are various Data Sources that organizations leverage to capture a variety of valuable data points. But, transferring data from these sources into a Data Warehouse for a holistic analysis is a hectic task. It requires you to code and maintains complex functions that can help achieve a smooth flow of data. An Automated Data Pipeline helps in solving this issue and this is where Hevo comes into the picture. Hevo Data is a No-code Data Pipeline and has awesome 150+ pre-built Integrations that you can choose from. Try a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also, check out our unbeatable pricing to choose the best plan for your organization.
FAQs
1. What are the key phases in a Mulesoft Batch Job?
The phases are Input, Load and Dispatch, Process, and On Complete.
2. What is the Batch Aggregator in Mulesoft?
It combines the processed records into a final output after the execution of Batch Steps.
3. Can Batch Jobs be scheduled in Mulesoft?
Yes, you can schedule them using the Scheduler module or external triggers.
Arsalan is a research analyst at Hevo and a data science enthusiast with over two years of experience in the field. He completed his B.tech in computer science with a specialization in Artificial Intelligence and finds joy in sharing the knowledge acquired with data practitioners. His interest in data analysis and architecture drives him to write nearly a hundred articles on various topics related to the data industry.