Message Broker Throughput

RabbitMQ 4.2 vs Kafka 4.2 (KRaft) vs NATS 2.12 (JetStream)

I started using NATS in one of my projects and was generally happy with it, but I wanted to verify the performance claims for myself. Is it really as fast as people say, or is that just marketing and cherry-picked benchmarks? The best way to find out was to write my own tests and compare NATS against the two most common alternatives: RabbitMQ and Kafka.

This post covers throughput testing of all three brokers on two messaging patterns: async producer-consumer queue, and request-reply. Request-reply is not the typical use case for message brokers, but NATS supports it natively, so it was worth measuring how the others perform when forced into that pattern.

RabbitMQ

Kafka

NATS JetStream

Test Environment

All three brokers ran in Docker containers on the same host. No custom tuning was applied to any broker: default configurations only.

CPU

Ryzen 7 8845HS

Cores

8C / 16T

Windows 11

Runtime

.NET 10.0.4

Framework

BenchmarkDotNet 0.15.8

Metric

P95 (ms)

Broker Versions

Broker	Docker Image	Configuration
RabbitMQ	`rabbitmq:4.2-management`	Default settings, AMQP 0.9.1
Kafka	`apache/kafka:4.2.0`	KRaft mode, single node, 1 partition
NATS	`nats:2.12-alpine`	JetStream enabled (`-js`)

Idle RAM (Cold Start)

Measured via docker stats on freshly started containers with no accumulated data or active connections.

NATS

6 MiB

RabbitMQ

122 MiB

Kafka

327 MiB

Kafka's JVM-based architecture is immediately visible: 54x the memory of NATS and 2.7x of RabbitMQ on cold start. NATS is the lightest at 6 MiB.

BenchmarkDotNet Configuration

3 warmup iterations, 10 measured iterations per scenario
InvocationCount = 1, UnrollFactor = 1 (each iteration is a single benchmark call)
RunStrategy = Monitoring
GC: non-concurrent, forced collections, non-server mode
ThreadPool: minimum 100 worker + 100 I/O completion port threads

Note on metric: All result tables use P95 (95th percentile) rather than Mean. P95 better represents worst-case performance a system will realistically encounter, filtering out warm-up noise while capturing tail latency.

Test Parameters

The message counts and payload sizes were chosen to cover two dimensions: the number of concurrent messages the broker must route, and the size of individual payloads. Counts are inversely proportional to payload size to keep total benchmark runtime within a few minutes per scenario while still loading the broker enough to reveal its throughput characteristics.

Async Queue (250 concurrent publishers, 1 consumer)

Messages	Payload	Total Volume
50,000	256 B	12.8 MB
25,000	1 KB	25 MB
10,000	4 KB	40 MB
5,000	64 KB	327 MB
2,500	128 KB	335 MB

Request-Reply (150 concurrent publishers)

Messages	Payload	Total Volume
25,000	256 B	6.4 MB
10,000	1 KB	10 MB
5,000	4 KB	20 MB

The async pattern uses more publishers (250 vs 150) and reaches larger payloads because bulk throughput is the primary concern. Request-reply uses fewer messages and smaller payloads reflecting the typical RPC use case where latency matters more than volume.

Implementation Details

Async Queue (Producer-Consumer)

All three implementations follow the same structure: N publishers concurrently push messages into a queue/topic/stream, one consumer reads everything. The benchmark measures wall-clock time from the first publish to the last received message.

RabbitMQ (RabbitMQ.Client v7.2.1): persistent messages, QoS prefetch = 100, manual ACK, separate connections for publisher and consumer.

Kafka (Confluent.Kafka v2.13.2): idempotent producer, 1 GB write buffer, manual offset commit, single partition, consumer group ID randomized per iteration.

NATS JetStream (NATS.Net v2.7.3): file-backed stream, workqueue retention, async persistence, explicit ACK, 1 GB writer buffer.

Request-Reply

NATS has native request-reply: RequestAsync sends a message and returns a response in a single call. The broker handles response routing internally.

RabbitMQ and Kafka lack this primitive. For both, request-reply was implemented via correlation IDs:

Requester generates a UUID, attaches it to the message, stores a TaskCompletionSource in a ConcurrentDictionary
Responder receives the message, echoes the correlation ID back on a dedicated reply queue/topic
Requester's reply listener matches the ID and completes the corresponding task

Each "request" in RabbitMQ/Kafka involves 4 broker operations (publish request, consume request, publish reply, consume reply) vs 1 round-trip in NATS.

Results: Async Queue

All values are P95 (95th percentile) completion time in milliseconds. Lower is better. Ratio columns show time relative to NATS JetStream (baseline).

50,000 x 256 B (12.8 MB total)

NATS

944 ms

RabbitMQ

1,521 ms

Kafka

35,856 ms

25,000 x 1 KB (25 MB total)

NATS

511 ms

RabbitMQ

905 ms

Kafka

18,629 ms

10,000 x 4 KB (40 MB total)

NATS

256 ms

RabbitMQ

442 ms

Kafka

8,329 ms

5,000 x 64 KB (327 MB total)

RabbitMQ

534 ms

NATS

878 ms

Kafka

7,496 ms

2,500 x 128 KB (335 MB total)

RabbitMQ

690 ms

NATS

735 ms

Kafka

7,162 ms

P95 Completion Time (ms)

Scenario	RabbitMQ	Kafka	NATS JS	Rabbit / NATS	Kafka / NATS
50K x 256 B	1,521	35,856	944	1.61	38.98
25K x 1 KB	905	18,629	511	1.77	36.46
10K x 4 KB	442	8,329	256	1.73	32.54
5K x 64 KB	534	7,496	878	0.61	8.54
2.5K x 128 KB	690	7,162	735	0.94	9.74

Messages per Second (at P95)

Scenario	RabbitMQ	Kafka	NATS JS
50K x 256 B	32,873	1,394	52,966
25K x 1 KB	27,624	1,342	48,924
10K x 4 KB	22,624	1,201	39,063
5K x 64 KB	9,363	667	5,695
2.5K x 128 KB	3,623	349	3,401

On small to medium payloads (up to 4 KB), NATS JetStream processes messages 1.6-1.8x faster than RabbitMQ at P95. On large payloads (64 KB+), RabbitMQ takes the lead at 61-94% of NATS's time. RabbitMQ allocates 7-12 MB managed memory for large payloads, while NATS allocates 368-401 MB. AMQP framing is more efficient for large contiguous payloads.

Kafka is 9-38x slower than NATS at P95. This is expected: Kafka's commit log architecture adds overhead that only pays off with horizontal scaling across multiple partitions and nodes.

Managed Heap Allocation

Scenario	RabbitMQ	Kafka	NATS JS
50K x 256 B	106 MB	115 MB	678 MB
25K x 1 KB	54 MB	76 MB	342 MB
10K x 4 KB	22 MB	60 MB	205 MB
5K x 64 KB	12 MB	323 MB	401 MB
2.5K x 128 KB	7 MB	318 MB	368 MB

RabbitMQ consistently uses the least managed memory. NATS allocates significantly more due to the 1 GB writer buffer configuration. Kafka's allocations spike with large payloads (318-323 MB) due to its own producer buffer (QueueBufferingMaxKbytes = 1 GB).

Results: Request-Reply

All values are P95 completion time. Ratio columns show time relative to NATS (baseline).

25,000 x 256 B

NATS

397 ms

Kafka

36,572 ms

RabbitMQ

41,450 ms

10,000 x 1 KB

NATS

226 ms

Kafka

15,113 ms

RabbitMQ

21,434 ms

5,000 x 4 KB

NATS

159 ms

Kafka

7,339 ms

RabbitMQ

12,231 ms

Scenario	RabbitMQ	Kafka	NATS	Rabbit / NATS	Kafka / NATS
25K x 256 B	41,450	36,572	397	104.41	92.12
10K x 1 KB	21,434	15,113	226	94.84	66.87
5K x 4 KB	12,231	7,339	159	76.92	46.16

Messages per Second (Request-Reply, at P95)

Scenario	RabbitMQ	Kafka	NATS
25K x 256 B	603	684	62,972
10K x 1 KB	467	662	44,248
5K x 4 KB	409	681	31,447

NATS is 46-92x faster than Kafka and 77-104x faster than RabbitMQ at P95. This is the difference between a native protocol primitive (one network round-trip) and an application-level emulation (four broker operations per request).

RabbitMQ is the slowest in all request-reply scenarios, with P95 degrading linearly: 12.2s for 5K messages, 21.4s for 10K, 41.4s for 25K. The per-message overhead is roughly constant at ~1.7 ms, dominated by the ACK cycle on both request and reply queues.

Kafka also shows high tail latency: P95 reaches 36.6s on the 25K scenario (Mean is 23.3s), indicating consumer group coordination and offset management overhead amplified in what is effectively a synchronous request pattern.

Broker Comparison

RabbitMQ 4.2

Strengths

Mature AMQP implementation with 15+ years of production usage
Rich routing model: direct, topic, fanout, and headers exchanges with flexible bindings
Management UI included (port 15672), exposing queue depths, message rates, connection counts
Lowest managed memory allocation in benchmarks, particularly with large payloads
Multi-protocol support: AMQP 0.9.1, AMQP 1.0, MQTT 3.1.1/5.0, STOMP
Plugin ecosystem: delayed message exchange, federation, shovel, consistent hash exchange
Quorum queues and streams for HA and replay scenarios

Weaknesses

1.6-1.8x slower than NATS on small message async throughput
No native request-reply, must be implemented via correlation IDs
Classic mirrored queues deprecated; quorum queues improve HA but add latency
Erlang runtime limits low-level troubleshooting and custom extensions
Clustering can exhibit split-brain under network partitions

Apache Kafka 4.2

Strengths

Distributed commit log with configurable retention, allowing consumers to replay from any offset
Horizontal throughput scaling via partition-based parallelism
Exactly-once semantics with idempotent producers and transactional API
Extensive ecosystem: Kafka Connect (200+ connectors), Kafka Streams, ksqlDB, Schema Registry
Standard for event sourcing, CDC (Debezium), and data pipeline architectures
KRaft mode removes ZooKeeper dependency

Weaknesses

Slowest in every scenario in this benchmark (single-node, single-partition is its worst case)
327 MiB RAM on cold start (JVM heap), 54x NATS
High operational complexity: partitions, ISR, consumer group rebalancing, offset management
Consumer group rebalancing causes consumption pauses
Latency-optimized for batched throughput, not per-message delivery
No native request-reply

NATS 2.12 with JetStream

Strengths

Fastest in 3 out of 5 async scenarios, and all 3 request-reply scenarios
Native request-reply at the protocol level, no application-level workarounds needed
Operationally minimal: single binary, single flag (-js) enables persistence
6 MiB RAM on cold start
JetStream provides persistence, replay, exactly-once delivery, de-duplication
Subject-based routing with hierarchical wildcards
Built-in key-value store and object store
Service discovery via micro package
Leafnode and gateway topologies for multi-cluster deployments

Weaknesses

Higher managed memory allocation per message (up to 678 MB on 50K x 256 B)
Slower than RabbitMQ on large payloads (64 KB+), ratio 0.61-0.94
Smaller community and fewer production war stories compared to RabbitMQ/Kafka
JetStream is younger than Kafka Streams; less battle-tested at extreme scale
Monitoring/observability tooling is less mature

Conclusion

For new projects that need a general-purpose message broker, NATS is the most practical starting point.

It provides a feature set comparable to Kafka: persistence with replay, exactly-once delivery, stream processing primitives, key-value and object stores. At the same time, its throughput on small-to-medium payloads matches or exceeds RabbitMQ, and it handles request-reply 46-105x faster than either alternative at P95 thanks to native protocol support.

The operational cost is also lower. A single binary with one flag gives you a persistent, JetStream-enabled broker consuming 6 MiB of RAM on cold start. Compare that to Kafka's 327 MiB.

RabbitMQ remains a strong choice when the workload is primarily large payloads (64 KB+) or when the team has deep AMQP expertise. Kafka is still the right tool for large-scale event streaming, CDC pipelines, and scenarios where partition-based parallelism and the Connect/Streams ecosystem matter.

But as a default choice for a new distributed system? NATS delivers Kafka-class features at RabbitMQ-class speed, with less operational overhead than either.