⚡ TL;DR
A production-ready Spring Batch extension for dynamic clustering using only the database — no Kafka, no RabbitMQ, no coordination servers. Version 1.0.0 is now released on Maven Central!
🚀 Why I Built This
Distributed batch processing often requires complex infrastructure — think messaging systems, zookeeper, or centralized schedulers. In many real-world deployments, especially in financial institutions, this adds cost, risk, and tight operational coupling.
To solve this, I built a lightweight, pluggable Spring Batch extension that enables cluster-aware partitioned step execution using nothing but a shared database as the coordination layer.
🛠️ Key Features
✅ Cluster Coordination Using DB Only
Each node registers, heartbeats, and participates in job execution using lightweight database tables — no brokers required.
✅ Dynamic Partition Assignment
Partitions are assigned and rebalanced at runtime based on node availability — perfect for ephemeral cloud-native deployments.
✅ Failover & Recovery
If a node dies mid-job, remaining nodes detect the loss and reassign unfinished partitions.
✅ Pluggable Coordination Tables
Custom schemas (BATCH_NODES
, BATCH_PARTITIONS
, BATCH_JOB_COORDINATION
) provide fine-grained visibility and control.
✅ Zero External Dependencies
Built purely on Spring Batch 5.x, Spring JDBC, and standard transaction semantics.
📦 Released Artifacts
<dependency>
<groupId>io.github.jchejarla</groupId>
<artifactId>clustering-core</artifactId>
<version>1.0.0</version>
</dependency>
Available now on Maven Central.
🔍 Architecture Overview
- Master Node — Automatically elected (no Zookeeper!) based on job launcher.
-
Coordinator Tables — Shared DB tables used to track:
- Active nodes
- Step partition states
- Ongoing executions
- PartitionHandler — Custom implementation that uses SQL to dynamically assign work.
✅ Example Code: GitHub - examples directory for ready-to-run Spring Boot projects demonstrating the cluster partitioning in action.
💡 Use Cases
- Spring Batch jobs in Kubernetes (nodes scale up/down)
- FinTech ETL pipelines where messaging systems are overkill
- On-prem enterprise environments with restricted tech stacks
- Batch workloads on serverless compute (like AWS Fargate or Google Cloud Run)
🧪 Example: Dynamic Step Execution
@Bean
public Step partitionedStep() {
return stepBuilderFactory.get("partitionedStep")
.partitioner("workerStep", customPartitioner())
.partitionHandler(clusterAwarePartitionHandler())
.build();
}
The clusterAwarePartitionHandler
is where the DB magic happens.
📈 What’s Next
🔹 Monitoring endpoints
🔹 Retry policies and smarter failure detection
🔹 Integration with Spring Boot Actuator
🔹 Auto-configuration starter (spring-boot-starter-batch-clustered
)
🙌 Contribute or Follow
💬 Issues and PRs welcome — especially around test cases and Kubernetes integrations.
🧠 Behind the Scenes
This project was born from practical needs in building scalable ETL pipelines in a large financial services ecosystem. By removing the messaging layer, we reduced infra cost and simplified failover handling — while maintaining full reliability.
👋 Final Thoughts
If you're tired of spinning up Kafka just to run partitioned jobs — this is for you.
This library is designed for engineers, architects, and teams who want reliability without orchestration overload. Try it out, share feedback, and help shape the roadmap.
Top comments (0)