Transactional Outbox Pattern Implementation
Why We Implemented the Outbox Pattern
The Problem: Dual Write Issue
In distributed systems, we often need to perform two operations atomically:- Write to the database (e.g., save a user)
- Publish an event (e.g., notify other services)
Real-World Scenarios Where This Fails
-
NATS Server is Down
- User registration succeeds in MongoDB
- Event publishing fails
- Read model never gets updated
- User can’t login (data inconsistency)
-
Network Partition
- Order is placed and saved
- Network issue prevents event from reaching NATS
- Inventory is never decremented
- Payment is never processed
-
Application Crash
- User data is committed to MongoDB
- Application crashes before publishing event
- Event is lost forever
The Solution: Transactional Outbox Pattern
The Outbox Pattern ensures guaranteed event delivery by making database writes and event publishing atomic.How It Works
Architecture Overview
Components
1. OutboxEvent Domain Model
2. OutboxRepository
Save()- Store event (called within transaction)FindUnpublished()- Get pending eventsMarkPublished()- Update status after successful publishIncrementAttempts()- Track retriesDeletePublished()- Cleanup old events
3. Outbox Relay Worker
- Runs in background goroutine
- Polls every 1 second
- Publishes events to NATS
- Automatic retry (max 3 attempts)
- Dead letter logging for failed events
Benefits
1. Guaranteed Delivery
- Events are never lost
- Even if NATS is down for hours, events will be published when it comes back up
2. Atomicity
- Database write and event creation happen in the same transaction
- No partial failures
3. Automatic Retry
- Failed events are automatically retried
- No manual intervention needed
4. Audit Trail
- Complete history of all events in the database
- Can query to see pending/failed events
5. Graceful Degradation
- System continues to work even if message broker is down
- Events are queued and published later
Trade-offs
Pros
- ✅ Zero data loss - Events guaranteed to be published
- ✅ Fault tolerance - Works even when NATS is down
- ✅ Automatic retry - No manual recovery needed
- ✅ Audit trail - Complete event history
- ✅ Simple - ~200 lines of code
Cons
- ⚠️ Eventual consistency - 1-2 second delay before events are published
- ⚠️ Storage overhead - Outbox table grows over time (needs cleanup)
- ⚠️ Complexity - One more component to manage
Implementation Details
Monitoring & Operations
Query Pending Events
Query Failed Events (Dead Letters)
Cleanup Old Events
The system automatically deletes published events older than 7 days:Testing the Implementation
1. Normal Flow (NATS is Up)
2. Failure Scenario (NATS is Down)
When to Use Outbox Pattern
✅ Use It When:
- Events are critical (user registration, payment, order)
- You need guaranteed delivery
- You can tolerate 1-2 second delay
- You’re using a database with transaction support (MongoDB Replica Set)
❌ Skip It When:
- Events are non-critical (page view, analytics)
- You need real-time delivery (< 100ms)
- You’re okay with occasional data loss
- You don’t have transaction support
Future Enhancements
- Batch Publishing - Publish multiple events at once for better throughput
- Priority Queue - Critical events published first
- Exponential Backoff - Smarter retry strategy
- Metrics - Prometheus metrics for monitoring
- Admin UI - View and republish dead letters