Skip to content

Queues vs Streams: Message Delivery, Ordering & Reliability

Core Distinction

The fundamental split is about message lifecycle.

Concept Queue (RabbitMQ, SQS) Stream (Kafka, Kinesis)
Message is A command — do this task An event — this fact happened
After processing Deleted Persisted (retention window)
Multiple consumers Compete — each message to one Independent — each reads at own offset
Replay Not possible Possible (rewind offset)
Ordering Per-queue, limited Per-partition, strict
Throughput Thousands/sec Millions/sec

Rule of thumb: tasks demand queues; immutable facts demand streams.


Message Delivery Semantics

Three guarantees exist, each with trade-offs:

Guarantee Description Risk Where used
At-most-once Fire-and-forget, no retry Data loss possible Metrics, logs where loss is acceptable
At-least-once Retry until ACK Duplicates possible Default for queues/streams
Exactly-once Idempotent + transactions Highest latency/complexity Financial, inventory systems

Exactly-once in Kafka: requires enable.idempotence=true on producer + transactional API + idempotent consumers (check processed event IDs).

# Producer-side idempotency (Kafka)
producer_config = {
    "enable.idempotence": True,
    "acks": "all",
    "retries": 5,
    "max.in.flight.requests.per.connection": 1,
}

Ordering Guarantees

Kafka: strict ordering within a partition, not across partitions. Partition key determines which partition a message lands in.

# Same document_id → same partition → strict order guaranteed
producer.produce(
    topic="document-edits",
    key=document_id,   # partition key
    value=event_payload,
)

SQS Standard: no ordering guarantee; occasional duplicates.
SQS FIFO: strict ordering, exactly-once, but max 3 000 TPS per queue.
RabbitMQ: per-queue order if single consumer; breaks with competing consumers.


Retries & Backoff

Never retry immediately — use exponential backoff with jitter.

import time
import random
import logging

logger = logging.getLogger(__name__)

MAX_RETRIES = 5
BASE_DELAY = 1.0   # seconds

def process_with_retry(message: dict, handler) -> bool:
    for attempt in range(MAX_RETRIES):
        try:
            handler(message)
            return True
        except TransientError as exc:
            delay = BASE_DELAY * (2 ** attempt) + random.uniform(0, 0.5)
            logger.warning(
                "attempt=%d failed, retrying in %.2fs: %s",
                attempt + 1,
                delay,
                exc,
            )
            time.sleep(delay)
        except PermanentError as exc:
            logger.error("permanent failure, sending to DLQ: %s", exc)
            send_to_dlq(message, exc)
            return False
    send_to_dlq(message, "max retries exceeded")
    return False

Transient errors (network timeout, DB overload) → retry.
Permanent errors (schema mismatch, validation failure) → skip retries, send to DLQ.


Dead-Letter Queues (DLQ)

A DLQ is a dedicated topic/queue that captures messages that cannot be processed. It isolates failures from the main pipeline, preserving throughput.

What to store in DLQ

import json
import traceback

def send_to_dlq(original: dict, error: Exception, producer) -> None:
    dlq_record = {
        "original_payload": original,
        "error_type": type(error).__name__,
        "error_message": str(error),
        "stack_trace": traceback.format_exc(),
        "failed_at": "2026-03-30T12:00:00Z",
        "source_topic": original.get("_topic"),
        "source_partition": original.get("_partition"),
        "source_offset": original.get("_offset"),
    }
    logger.error("dlq send | error=%s | payload=%s", error, original)
    producer.produce("orders.dlq", value=json.dumps(dlq_record))

DLQ recovery patterns

Pattern When to use
Replay Fix the bug, rewind offset, reprocess
Parking-lot Move to retry topic, process later
Circuit breaker Pause all retries during outage
Manual review Schema / business-logic errors

Tools Comparison

Kafka RabbitMQ SQS
Model Stream (log) Queue (AMQP) Queue (HTTP)
Ordering Per partition Per queue FIFO variant
Replay Yes No No
DLQ App-level topic Built-in (x-dead-letter-exchange) Built-in (RedrivePolicy)
Routing Topic + consumer groups Exchange + bindings Filter policies
Ops overhead High Medium Zero (managed)
Best for Event sourcing, audit log, multi-consumer Task queues, priority routing Simple decoupling, AWS ecosystem

Decision Guide

Need replay / multiple independent consumers?
  └─ YES → Kafka / Kinesis (stream)
  └─ NO  →
        Need complex routing / priority?
          └─ YES → RabbitMQ
          └─ NO  →
                In AWS and want zero ops?
                  └─ YES → SQS
                  └─ NO  → RabbitMQ or Redis Streams

Most mature systems combine both: Kafka as the central event bus, individual services fan out to SQS/RabbitMQ for their own task workers.