You may have heard of batch processing, where data to be processed is typically collected over a large duration (maybe several minutes to several hours) and then processed in one go. RTGS or NEFT payments are an example, where the payment requests are registered over time and the payments are executed in bulk after a fixed interval of time (typically half an hour for NEFT payments). Then, at the other end of the spectrum, there is stream processing, where the processing happens instantaneously, as soon as data arrives at the server.
Micro batching is a middle-ground between batch processing and stream processing that balances latency and throughput and can be the ideal option for several use cases. It strives to increase the server throughput through some sort of batch processing, and, at the same time, reduces the latency at the client’s end.
UPI payments are good examples. As you would have guessed, batch processing has high throughput, but high latency as well. On the other hand, stream processing has low latency, but, depending on the application, low throughput as well.` In this article, we will see how Micro batching works. We will also look at applications of Micro batching and how to determine if micro-batching is the ideal solution for your application.
Table of Contents
How does Micro Batching work?
In micro-batching, a server typically waits for a short duration of time (this can be milliseconds or several seconds), before executing a batch operation. The duration of time it waits is called the batch cycle, and the number of tasks within a cycle is called the batch size. The system can have an upper limit on the batch size as well.
For example, if a system has a batch cycle of 1 second and a batch size limit of 64, then if the number of tasks accumulated in a second is less than 64 even then the processing will start. Alternatively, if the system is being bombarded with tasks and 64 tasks accumulate in 200 milliseconds, then the system won’t wait for the entire second but rather start processing the tasks immediately. The behavior of the Micro batching system can, of course, change depending on how you’ve programmed it and what rules you’ve set for it.
Some micro batching systems may follow a variable duration batch cycle. This means that a new process starts immediately after the previous one ends. Thus, the batch cycle is variable and determined by the amount of time it takes the execute the already accumulated tasks. And only the tasks received while one process is ongoing are accumulated.
Hevo Data, a Fully-managed Data Pipeline platform, can help you automate, simplify & enrich your data replication process in a few clicks. With Hevo’s wide variety of connectors and blazing-fast Data Pipelines, you can extract & load data from 100+ Data Sources (including 40+ Free Sources) straight into your Data Warehouse or any Databases.
To further streamline and prepare your data for analysis, you can process and enrich raw granular data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!
GET STARTED WITH HEVO FOR FREE[/hevoButton]
Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!
When does Micro Batching make sense?
Micro batching makes sense when you require quicker responses than batch processing, but can wait for a short duration (i.e., it is okay if the response is not immediate). This is from the client’s perspective.
From the server’s perspective, micro-batching makes sense when processing tasks in a batch is much more efficient (in terms of computational resources like power, memory, wear and tear, and also in terms of time), than processing each task independently. Thus, if you have an API server that primarily addresses GET requests requiring lookups from a small table, then micro-batching won’t make sense.
However, if your server gets a lot of log data from the clients that it needs to add to a database, it can be much more efficient to insert several rows simultaneously into the database rather than inserting each row independently. In this case, the server can wait for the accumulation of the logs for the duration of the batch cycle, and then insert all the accumulated logs into the database in one go.
Applications of Micro Batching
Micro-batching helps systems that deal with a variable workload where it makes sense to use micro-batching, based on the factors discussed in the above section. Listed below are a few application areas:
- Database and File Ingestion: Writing data to a database or a filesystem in a disk is much more efficient when done in large chunks or blocks. Not only are overheads high for each write cycle, but some Flash and EEPROM memory chips (especially on embedded systems) have a fixed number of write cycles, and writing data for each task can exhaust the write cycles quickly.
- Large Database Lookup: Getting items from a large database can be time-consuming and computationally heavy. Instead of scanning the database for each query, queries can be clubbed together (especially if they all request for records based on a specific field, say id) and a single combined query can be run on the database.
- Web Analytics: If you run a website, you may want your analytics to be granular. However, you may not want seconds-level granularity. If you are a simple blogger, even a day-level granularity will do (batch processing). However, if you run an e-commerce or some other high-traffic website, then you may need minutes-level granularity (micro-batching), especially to understand if a UI or UX change led to a significant drop in user buying patterns, and should be reversed.
- IoT: Say you run a telematics service wherein users can see the live location of their vehicle on the app, along with stats like runtime, kilometers traveled, etc. Now, typically, a user will be fine with the update to the stats (and perhaps even the location) happening every 1-2 seconds. A millisecond-level update might be overwhelming for the app as well, and greatly increase the cost of the analytics service which is ultimately passed down to the user.
Tools for Micro Batching
Perhaps the most popular one is Apache Spark Streaming, which, even though the name is misleading, is a micro-batch process extension for the Spark API. Vertica also offers support for micro-batching.
However, tools aside, what matters is an understanding of the concept of micro-batching and an analysis of whether it is required. Once these things are clear, it is very much possible to modify your own server-side scripts to use micro-batch processing.
Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s Automated, No-code Platform empowers you with everything you need to have for a smooth data replication experience.
Check out what makes Hevo amazing:
TRY OUR 14 DAY FREE TRIAL
- Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
- Data Transformation: Hevo provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making.
- Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ Data Sources (with 40+ free sources) that can help you scale your data infrastructure as required.
- Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Best Practices for Micro Batching
Here are a few practices of Micro batching:
- First, determine whether you require micro-batching. You may be better served by stream processing if your priority is real-time responses, and by batch processing, if the freshness of data is not a great concern. Refer to the application examples above to understand scenarios where micro batching may be preferable.
- Adjust your batch cycle time so that the latency seen by the clients doesn’t cross an uncomfortable level and, at the same time, the server throughput doesn’t fall too much. This can be achieved by trial and error.
- Always have a time cutoff in your algorithm. If you don’t start your processing till a certain batch size is achieved, the latency can greatly increase in low traffic scenarios
We saw what micro batching is, and how it compares to batching and streaming. We also saw how it works and when it makes sense to use it. Some application examples were presented to further clarify situations where micro batching makes sense. Finally, a couple of tools for micro-batching were discussed. I hope this article provided you with the required overview of micro-batching. Thanks for reading.
To meet the growing storage and computing needs of data, you would need to invest some of your Engineering Bandwidth in integrating data from all sources, cleaning and transforming it, and finally loading it to a Cloud Data Warehouse for further Business Analytics. All of these issues can be efficiently addressed by a Cloud-Based ETL tool like Hevo Data, A No-code Data Pipeline, that has awesome 100+ pre-built Integrations that you can choose from.
Visit our Website to Explore Hevo
Hevo can help you integrate your data from numerous sources and load them into destinations like Snowflake to analyze real-time data with BI tools of your choice. It will make your life easier and Data Migration hassle-free. It is user-friendly, reliable, and secure.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. Hevo offers plans & pricing for different use cases and business needs, check them out!
Share your experience of learning Micro Batching in the comments section below. We would love to hear from you!