Message Broker Throughput

RabbitMQ 4.2 vs Kafka 4.2 (KRaft) vs NATS 2.12 (JetStream)

I started using NATS in one of my projects and was generally happy with it, but I wanted to verify the performance claims for myself. Is it really as fast as people say, or is that just marketing and cherry-picked benchmarks? The best way to find out was to write my own tests and compare NATS against the two most common alternatives: RabbitMQ and Kafka.

This post covers throughput testing of all three brokers on two messaging patterns: async producer-consumer queue, and request-reply. Request-reply is not the typical use case for message brokers, but NATS supports it natively, so it was worth measuring how the others perform when forced into that pattern.

RabbitMQ
Kafka
NATS JetStream

Test Environment

All three brokers ran in Docker containers on the same host. No custom tuning was applied to any broker: default configurations only.

CPU
Ryzen 7 8845HS
Cores
8C / 16T
OS
Windows 11
Runtime
.NET 10.0.4
Framework
BenchmarkDotNet 0.15.8
Metric
P95 (ms)

Broker Versions

BrokerDocker ImageConfiguration
RabbitMQrabbitmq:4.2-managementDefault settings, AMQP 0.9.1
Kafkaapache/kafka:4.2.0KRaft mode, single node, 1 partition
NATSnats:2.12-alpineJetStream enabled (-js)

Idle RAM (Cold Start)

Measured via docker stats on freshly started containers with no accumulated data or active connections.

NATS
6 MiB
RabbitMQ
122 MiB
Kafka
327 MiB

Kafka's JVM-based architecture is immediately visible: 54x the memory of NATS and 2.7x of RabbitMQ on cold start. NATS is the lightest at 6 MiB.

BenchmarkDotNet Configuration

Note on metric: All result tables use P95 (95th percentile) rather than Mean. P95 better represents worst-case performance a system will realistically encounter, filtering out warm-up noise while capturing tail latency.

Test Parameters

The message counts and payload sizes were chosen to cover two dimensions: the number of concurrent messages the broker must route, and the size of individual payloads. Counts are inversely proportional to payload size to keep total benchmark runtime within a few minutes per scenario while still loading the broker enough to reveal its throughput characteristics.

Async Queue (250 concurrent publishers, 1 consumer)

MessagesPayloadTotal Volume
50,000256 B12.8 MB
25,0001 KB25 MB
10,0004 KB40 MB
5,00064 KB327 MB
2,500128 KB335 MB

Request-Reply (150 concurrent publishers)

MessagesPayloadTotal Volume
25,000256 B6.4 MB
10,0001 KB10 MB
5,0004 KB20 MB

The async pattern uses more publishers (250 vs 150) and reaches larger payloads because bulk throughput is the primary concern. Request-reply uses fewer messages and smaller payloads reflecting the typical RPC use case where latency matters more than volume.

Implementation Details

Async Queue (Producer-Consumer)

All three implementations follow the same structure: N publishers concurrently push messages into a queue/topic/stream, one consumer reads everything. The benchmark measures wall-clock time from the first publish to the last received message.

RabbitMQ (RabbitMQ.Client v7.2.1): persistent messages, QoS prefetch = 100, manual ACK, separate connections for publisher and consumer.

Kafka (Confluent.Kafka v2.13.2): idempotent producer, 1 GB write buffer, manual offset commit, single partition, consumer group ID randomized per iteration.

NATS JetStream (NATS.Net v2.7.3): file-backed stream, workqueue retention, async persistence, explicit ACK, 1 GB writer buffer.

Request-Reply

NATS has native request-reply: RequestAsync sends a message and returns a response in a single call. The broker handles response routing internally.

RabbitMQ and Kafka lack this primitive. For both, request-reply was implemented via correlation IDs:

  1. Requester generates a UUID, attaches it to the message, stores a TaskCompletionSource in a ConcurrentDictionary
  2. Responder receives the message, echoes the correlation ID back on a dedicated reply queue/topic
  3. Requester's reply listener matches the ID and completes the corresponding task

Each "request" in RabbitMQ/Kafka involves 4 broker operations (publish request, consume request, publish reply, consume reply) vs 1 round-trip in NATS.

Results: Async Queue

All values are P95 (95th percentile) completion time in milliseconds. Lower is better. Ratio columns show time relative to NATS JetStream (baseline).

50,000 x 256 B (12.8 MB total)
NATS
944 ms
RabbitMQ
1,521 ms
Kafka
35,856 ms
25,000 x 1 KB (25 MB total)
NATS
511 ms
RabbitMQ
905 ms
Kafka
18,629 ms
10,000 x 4 KB (40 MB total)
NATS
256 ms
RabbitMQ
442 ms
Kafka
8,329 ms
5,000 x 64 KB (327 MB total)
RabbitMQ
534 ms
NATS
878 ms
Kafka
7,496 ms
2,500 x 128 KB (335 MB total)
RabbitMQ
690 ms
NATS
735 ms
Kafka
7,162 ms

P95 Completion Time (ms)

ScenarioRabbitMQKafkaNATS JSRabbit / NATSKafka / NATS
50K x 256 B1,52135,8569441.6138.98
25K x 1 KB90518,6295111.7736.46
10K x 4 KB4428,3292561.7332.54
5K x 64 KB5347,4968780.618.54
2.5K x 128 KB6907,1627350.949.74

Messages per Second (at P95)

ScenarioRabbitMQKafkaNATS JS
50K x 256 B32,8731,39452,966
25K x 1 KB27,6241,34248,924
10K x 4 KB22,6241,20139,063
5K x 64 KB9,3636675,695
2.5K x 128 KB3,6233493,401

On small to medium payloads (up to 4 KB), NATS JetStream processes messages 1.6-1.8x faster than RabbitMQ at P95. On large payloads (64 KB+), RabbitMQ takes the lead at 61-94% of NATS's time. RabbitMQ allocates 7-12 MB managed memory for large payloads, while NATS allocates 368-401 MB. AMQP framing is more efficient for large contiguous payloads.

Kafka is 9-38x slower than NATS at P95. This is expected: Kafka's commit log architecture adds overhead that only pays off with horizontal scaling across multiple partitions and nodes.

Managed Heap Allocation

ScenarioRabbitMQKafkaNATS JS
50K x 256 B106 MB115 MB678 MB
25K x 1 KB54 MB76 MB342 MB
10K x 4 KB22 MB60 MB205 MB
5K x 64 KB12 MB323 MB401 MB
2.5K x 128 KB7 MB318 MB368 MB

RabbitMQ consistently uses the least managed memory. NATS allocates significantly more due to the 1 GB writer buffer configuration. Kafka's allocations spike with large payloads (318-323 MB) due to its own producer buffer (QueueBufferingMaxKbytes = 1 GB).

Results: Request-Reply

All values are P95 completion time. Ratio columns show time relative to NATS (baseline).

25,000 x 256 B
NATS
397 ms
Kafka
36,572 ms
RabbitMQ
41,450 ms
10,000 x 1 KB
NATS
226 ms
Kafka
15,113 ms
RabbitMQ
21,434 ms
5,000 x 4 KB
NATS
159 ms
Kafka
7,339 ms
RabbitMQ
12,231 ms
ScenarioRabbitMQKafkaNATSRabbit / NATSKafka / NATS
25K x 256 B41,45036,572397104.4192.12
10K x 1 KB21,43415,11322694.8466.87
5K x 4 KB12,2317,33915976.9246.16

Messages per Second (Request-Reply, at P95)

ScenarioRabbitMQKafkaNATS
25K x 256 B60368462,972
10K x 1 KB46766244,248
5K x 4 KB40968131,447

NATS is 46-92x faster than Kafka and 77-104x faster than RabbitMQ at P95. This is the difference between a native protocol primitive (one network round-trip) and an application-level emulation (four broker operations per request).

RabbitMQ is the slowest in all request-reply scenarios, with P95 degrading linearly: 12.2s for 5K messages, 21.4s for 10K, 41.4s for 25K. The per-message overhead is roughly constant at ~1.7 ms, dominated by the ACK cycle on both request and reply queues.

Kafka also shows high tail latency: P95 reaches 36.6s on the 25K scenario (Mean is 23.3s), indicating consumer group coordination and offset management overhead amplified in what is effectively a synchronous request pattern.

Broker Comparison

RabbitMQ 4.2

Strengths

Weaknesses

Apache Kafka 4.2

Strengths

Weaknesses

NATS 2.12 with JetStream

Strengths

Weaknesses

Conclusion

For new projects that need a general-purpose message broker, NATS is the most practical starting point.

It provides a feature set comparable to Kafka: persistence with replay, exactly-once delivery, stream processing primitives, key-value and object stores. At the same time, its throughput on small-to-medium payloads matches or exceeds RabbitMQ, and it handles request-reply 46-105x faster than either alternative at P95 thanks to native protocol support.

The operational cost is also lower. A single binary with one flag gives you a persistent, JetStream-enabled broker consuming 6 MiB of RAM on cold start. Compare that to Kafka's 327 MiB.

RabbitMQ remains a strong choice when the workload is primarily large payloads (64 KB+) or when the team has deep AMQP expertise. Kafka is still the right tool for large-scale event streaming, CDC pipelines, and scenarios where partition-based parallelism and the Connect/Streams ecosystem matter.

But as a default choice for a new distributed system? NATS delivers Kafka-class features at RabbitMQ-class speed, with less operational overhead than either.