What is Micro Batching: A Comprehensive Guide 101

You may have heard of batch processing, where data to be processed is typically collected over a large duration (maybe several minutes to several hours) and then processed in one go. RTGS or NEFT payments are an example, where the payment requests are registered over time and the payments are executed in bulk after a fixed interval of time (typically half an hour for NEFT payments). Then, at the other end of the spectrum, there is stream processing, where the processing happens instantaneously, as soon as data arrives at the server.

Micro batching is a middle-ground between batch processing and stream processing that balances latency and throughput and can be the ideal option for several use cases. It strives to increase the server throughput through some sort of batch processing, and, at the same time, reduces the latency at the client’s end.

UPI payments are good examples. As you would have guessed, batch processing has high throughput, but high latency as well. On the other hand, stream processing has low latency, but, depending on the application, low throughput as well.` In this article, we will see how Micro batching works. We will also look at applications of Micro batching and how to determine if micro-batching is the ideal solution for your application.

Table of Contents

How does Micro Batching work?

In micro-batching, a server typically waits for a short duration of time (this can be milliseconds or several seconds), before executing a batch operation. The duration of time it waits is called the batch cycle, and the number of tasks within a cycle is called the batch size. The system can have an upper limit on the batch size as well.

For example, if a system has a batch cycle of 1 second and a batch size limit of 64, then if the number of tasks accumulated in a second is less than 64 even then the processing will start. Alternatively, if the system is being bombarded with tasks and 64 tasks accumulate in 200 milliseconds, then the system won’t wait for the entire second but rather start processing the tasks immediately. The behavior of the Micro batching system can, of course, change depending on how you’ve programmed it and what rules you’ve set for it.

Some micro batching systems may follow a variable duration batch cycle. This means that a new process starts immediately after the previous one ends. Thus, the batch cycle is variable and determined by the amount of time it takes the execute the already accumulated tasks. And only the tasks received while one process is ongoing are accumulated.

Are you looking for an ETL tool to migrate your data efficiently? Migrating your data can become seamless with Hevo’s no-code intuitive platform. With Hevo, you can:

Automate Data Extraction: Effortlessly pull data from various sources and destinations with 150+ pre-built connectors.
Transform Data effortlessly: Use Hevo’s drag-and-drop feature to transform data with just a few clicks.
Seamless Data Loading: Quickly load your transformed data into your desired destinations, such as BigQuery.
Transparent Pricing: Hevo offers transparent pricing with no hidden fees, allowing you to budget effectively while scaling your data integration needs.

Try Hevo and join a growing community of 2000+ data professionals who rely on us for seamless and efficient migrations.

Get Started with Hevo for Free

When does Micro Batching make sense?

Micro batching makes sense when you require quicker responses than batch processing, but can wait for a short duration (i.e., it is okay if the response is not immediate). This is from the client’s perspective.

From the server’s perspective, micro-batching makes sense when processing tasks in a batch is much more efficient (in terms of computational resources like power, memory, wear and tear, and also in terms of time), than processing each task independently. Thus, if you have an API server that primarily addresses GET requests requiring lookups from a small table, then micro-batching won’t make sense.

However, if your server gets a lot of log data from the clients that it needs to add to a database, it can be much more efficient to insert several rows simultaneously into the database rather than inserting each row independently. In this case, the server can wait for the accumulation of the logs for the duration of the batch cycle, and then insert all the accumulated logs into the database in one go.

Applications of Micro Batching

Micro-batching helps systems that deal with a variable workload where it makes sense to use micro-batching, based on the factors discussed in the above section. Listed below are a few application areas:

Database and File Ingestion: Writing data to a database or a filesystem in a disk is much more efficient when done in large chunks or blocks. Not only are overheads high for each write cycle, but some Flash and EEPROM memory chips (especially on embedded systems) have a fixed number of write cycles, and writing data for each task can exhaust the write cycles quickly.
Large Database Lookup: Getting items from a large database can be time-consuming and computationally heavy. Instead of scanning the database for each query, queries can be clubbed together (especially if they all request for records based on a specific field, say id) and a single combined query can be run on the database.
Web Analytics: If you run a website, you may want your analytics to be granular. However, you may not want seconds-level granularity. If you are a simple blogger, even a day-level granularity will do (batch processing). However, if you run an e-commerce or some other high-traffic website, then you may need minutes-level granularity (micro-batching), especially to understand if a UI or UX change led to a significant drop in user buying patterns, and should be reversed.
IoT: Say you run a telematics service wherein users can see the live location of their vehicle on the app, along with stats like runtime, kilometers traveled, etc. Now, typically, a user will be fine with the update to the stats (and perhaps even the location) happening every 1-2 seconds. A millisecond-level update might be overwhelming for the app as well, and greatly increase the cost of the analytics service which is ultimately passed down to the user.

Tools for Micro Batching

Perhaps the most popular one is Apache Spark Streaming, which, even though the name is misleading, is a micro-batch process extension for the Spark API. Vertica also offers support for micro-batching.

However, tools aside, what matters is an understanding of the concept of micro-batching and an analysis of whether it is required. Once these things are clear, it is very much possible to modify your own server-side scripts to use micro-batch processing.

Best Practices for Micro Batching

Here are a few practices of Micro batching:

First, determine whether you require micro-batching. You may be better served by stream processing if your priority is real-time responses, and by batch processing, if the freshness of data is not a great concern. Refer to the application examples above to understand scenarios where micro batching may be preferable.
Adjust your batch cycle time so that the latency seen by the clients doesn’t cross an uncomfortable level and, at the same time, the server throughput doesn’t fall too much. This can be achieved by trial and error.
Always have a time cutoff in your algorithm. If you don’t start your processing till a certain batch size is achieved, the latency can greatly increase in low traffic scenarios

Conclusion

We saw what micro batching is, and how it compares to batching and streaming. We also saw how it works and when it makes sense to use it. Some application examples were presented to further clarify situations where micro batching makes sense. Finally, a couple of tools for micro-batching were discussed. I hope this article provided you with the required overview of micro-batching. Thanks for reading.

To meet the growing storage and computing needs of data, you would need to invest some of your Engineering Bandwidth in integrating data from all sources, cleaning and transforming it, and finally loading it to a Cloud Data Warehouse for further Business Analytics. All of these issues can be efficiently addressed by a Cloud-Based ETL tool like Hevo Data, A No-code Data Pipeline, that has awesome 150+ pre-built Integrations that you can choose from.

Visit our Website to Explore Hevo

Hevo can help you integrate your data from numerous sources and load them into destinations like Snowflake to analyze real-time data with BI tools of your choice. It will make your life easier and Data Migration hassle-free. It is user-friendly, reliable, and secure.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. Hevo offers plans & pricing for different use cases and business needs, check them out!

Share your experience of learning Micro Batching in the comments section below. We would love to hear from you!

FAQs about Micro Batching

1. What are micro batches?

Micro batches refer to a data processing technique where data is processed in small, fixed-size batches rather than continuously or in large bulk.

2. What is the micro-batch method?

The micro-batch method is a data processing technique where incoming data is grouped and processed in small, fixed-size batches at regular intervals (e.g., every few seconds).

3. What is the difference between streaming and micro-batch?

Streaming processes data continuously in real-time, handling data item-by-item as it arrives, ensuring low latency. Meanwhile, micro-batch processes data in small fixed-size batches at regular intervals.

Yash Sanghvi Technical Content Writer, Hevo Data

Yash is a trusted expert in the data industry, recognized for his role in driving online success for companies through data integration and analysis. With a Dual Degree (B. Tech + M. Tech) in Mechanical Engineering from IIT Bombay, Yash brings strategic vision and analytical rigor to every project. He is proficient in Python, R, TensorFlow, and Hadoop, using these tools to develop predictive models and optimize data workflows.