This project simulates a Kafka-like distributed streaming system with Producers, Partitions, and Consumers. It visualizes how data flows through the system and provides real-time metrics on performance. It's useful to understand how records flow within the system and distribute across Consumers.
🔥 Go play now on our online version 🔥
❤️ Created by Renato Mefi
Sponsored by Evoura - Data Streaming and EDA experts consultancy
This project is licensed under CC BY-NC 4.0 - feel free to use and modify it for non-commercial purposes, but credit must be given to the original author.
This project was created with Claude 3.7 as a way of learning it, it looks hundreds of messages to get to this state, the code does not represent the author normal style of code as it's 99% AI generated with guidance on code style.
$ npm start
Visit http://localhost:1234
It's not an exact Kafka engine copy, for many reasons. The code is meant to provide the minimum set of Kafka-like functionalities in order to provide a meaningful visualization while having key properties like ordering, partitioning, consumer assignment, etc. It's not representative of real world scenarios, however can get enough visibility into possible issues that are applicable to real scenarios.
The simulation tracks and detects ordering issues:
- Records with the same key should be processed in order of their event time
- When records with the same key are processed out of order, a warning is logged
- This helps visualize the ordering guarantees (or lack thereof) in distributed stream processing
Key configuration parameters:
partitionsAmount
: Number of partitions in the systemproducersAmount
: Number of producers generating recordsconsumersAmount
: Number of consumers processing recordsproducerRate
: Records per second each producer generatesproducerDelayRandomFactor
: Random factor for producer delays (0-1)recordValueSizeMin
: Minimum record size in bytesrecordValueSizeMax
: Maximum record size in bytesrecordKeyRange
: Range of possible key valuespartitionBandwidth
: Network speed in bytes per secondconsumerThroughputMaxInBytes
: Maximum consumer processing capacityconsumerAssignmentStrategy
: How partitions are assigned to consumers
totalRecordsProduced
: Total number of records created by all producerstotalRecordsConsumed
: Total number of records processed by all consumerstotalBytesProduced
: Total bytes of data producedtotalBytesConsumed
: Total bytes of data consumedavgProcessingTimeMs
: Average processing time across all recordsprocessingTimeSamples
: Number of samples used for average calculation
recordsProduced
: Total records produced by this producerbytesProduced
: Total bytes produced by this producerproduceRate
: Current produce rate in bytes per secondrecordsRate
: Current produce rate in records per secondlastUpdateTime
: Last time metrics were updated
recordsConsumed
: Total records consumed by this consumerbytesConsumed
: Total bytes consumed by this consumerconsumeRate
: Current consumption rate in bytes per secondrecordsRate
: Current consumption rate in records per secondlastUpdateTime
: Last time metrics were updatedprocessingTimes
: Recent processing times (last 10 records)
- Produce records at a rate defined by
Config.producerRate
(records per second) - Random delay can be applied using
Config.producerDelayRandomFactor
(0-1s range) - Production scheduling is calculated using milliseconds-based timing
- Records are assigned to partitions based on their key (using modulo partitioning)
{
id: Number, // Unique identifier
lastProduceTime: Number // Timestamp of last record produced
}
- Receive records from producers
- Move records along at network speed defined by
Config.partitionBandwidth
(bytes/second) - When a record reaches the end of the partition, it notifies the assigned consumer
- Each partition has an offset counter that increments for each record
- Records remain in the partition during processing and are removed when processing completes
{
id: Number, // Unique identifier
records: Array, // Array of record objects in this partition
currentOffset: Number // Current offset counter
}
- Process records that have reached the end of their assigned partitions
- Assigned to partitions using strategies defined by
Config.consumerAssignmentStrategy
:round-robin
: Distributes partitions evenly across consumersrange
: Divides partitions into continuous ranges per consumersticky
: Attempts to maintain previous assignments when possiblecooperative-sticky
: Uses round-robin but creates locality clustering
- Have a throughput limit defined by
Config.consumerThroughputMaxInBytes
- Process records concurrently across all assigned partitions
- Distribute processing capacity evenly across active records
- Track processing state and progress for each record
- Queue records that arrive while at maximum capacity
{
id: Number, // Unique identifier
assignedPartitions: Array, // Array of partition IDs assigned to this consumer
activePartitions: Object, // Map of partitionId -> record being processed
processingTimes: Object, // Map of recordId -> {startTime, endTime}
throughputMax: Number, // Maximum bytes per second this consumer can process
processingQueues: Object, // Map of partitionId -> queue of records waiting
transitRecords: Array, // Records visually moving from partition to consumer
recordProcessingState: Object // Tracks bytes processed per record
}
- Created by producers with randomized characteristics
- Flow through partitions at speed determined by record size and partition bandwidth
- Have a unique ID, key, and value (size in bytes)
- Size visually represented by radius (larger value = larger radius)
- When they reach the end of a partition, they wait for consumer processing
- Processing time depends on record size and consumer throughput
- Tracked for ordering issues by key (out-of-order processing detection)
{
id: Number, // Unique record identifier
key: Number, // Record key (determines partition)
value: Number, // Size in bytes
producerId: Number, // Producer that created this record
partitionId: Number, // Partition this record belongs to
speed: Number, // Speed in ms based on size and bandwidth
offset: Number, // Position in the partition sequence
eventTime: Number, // Timestamp when record was created
isBeingProcessed: Boolean, // Whether record is currently being processed
isWaiting: Boolean, // Whether record is waiting to be processed
isProcessed: Boolean, // Whether record has been processed
processingProgress: Number, // Processing progress (0-1)
processingTimeMs: Number // Estimated processing time in milliseconds
}