A happy man working upon batches of bread, not unlike a powerful computer processing a large batch of data
Do you need to load large files into a database every night? Hoping to make that big load of data parallel, or partioned, to load quickly?
Receiving thousands of new files every day you must transform and load in your NoSQL document database?
Find yourself transforming a large volume of database data into aggregate reports, or for billing, every week?
Then, my friend, welcome to batch processing. And for all your batch processing needs, consider: Spring Batch!
But first, what exactly is batch processing? All of the above scenarios (creating aggregate reports on a schedule, loading files, weekly billing) are classic examples of batch processing, stereotypes even, of batch processing. A number of elements are common to batch processing. First, batch processing normally has no user interaction. A batch process kicks off like the alarm on a faithful alarm clock; the batch process is scheduled to start at a specific time, or the batch process kicks off when triggered by a specific event (incoming file, a message sent in between two servers, etc). Another common element of batch processing is there typically isn’t any slick flashy activity indicator to display the progress of the batch processing; large batch processes could take hours, and many times nobody cares to see the progress of a batch file save the individual charged with ensuring its smooth operation behind the scenes. Finally, many batch processes occur off-hours when there’s little activity, but that isn’t a sweeping statement applicable to all batch processes.
Batch processing is nothing new. Electronic batch processing has been around since the 1950s with mainframes, and earlier than that upon punch cards.
A mainframe in the 1950s for batch processing social security recordsSo if batch processing is nothing new, there’s plenty of tools available for batch processing! We have a saying at RIIS: shake a tree for a technical problem you’d like to solve, and 20 tools will fall out. Every tool will tell you how great it is, how its different or better than the other solutions, how “powerful” yet “simple” it is, and solutions from for-profit organizations can be biased. So how would you know if Spring Batch is the best for you, compared to other tools for your batch processing needs?
Let’s consider some of the primary benefits Spring Batch gives us compared to writing our own batch processing system, and against similar batch processing tools:
- Job history gets persisted in a database. Know when a job ran, how long it took, how many items processed, if the run was a success, if the run had an error, and the throughput
- A number of inputs & outputs are supported: fixed width files, delimited files, files with headers & footers, relational databases, NoSQL databases, message queues, and more
- Batch processes can be rerun
- The number of items processed at a time can be tweaked: because you don’t have unlimited hardware, you’ll need to do this. You can’t just load 1,000,000 lines into memory all at once, you’ll need to process all those items in “chunks” (say 10,000 items at a time): spring batch easily lends itself to this
- Large volume batches can be run in parallel & partitioned for fastest performance
- There’s a web application tool that lets you view how jobs ran, start new jobs, re-run failed jobs, and cancel running jobs
- Popular enough that people author hardcopy books about it
- In active development
- Integrates tightly with other tools such as Spring Boot & Spring Integration, which may give more lift as an overall solution for backend needs
Some things to consider when weighing if Spring Batch is right for you:
- Runs on Java 6, Java 7, and Java 8. You’ll need to author Java source code to glue everything together.
- There is no visual editor or IDE to author how the batch process works, unlike other tools. You will need to code to make this work.
- Other batch processing tools come with full “Enterprise Suites” and plenty of applications for you to explore in Windows. Typically these said tools are designed for a large organization to put all, or many of their batch jobs into one product. We’re talking production support, on-site training, and possibly spending hundreds of thousands USD here
- What you’re doing is so small and simple that any, or many, batch processing tools are overarchitecture
This concludes our introduction to Spring Batch & batch processing. The subsequent blog post Jumping Two Feet In Deep With Spring Batch gives a deeper dive into the nitty gritty details of how Spring Batch works.