Outbox Pattern

Transactional Outbox Pattern Implementation

Why We Implemented the Outbox Pattern

The Problem: Dual Write Issue

In distributed systems, we often need to perform two operations atomically:
  1. Write to the database (e.g., save a user)
  2. Publish an event (e.g., notify other services)
Without the Outbox Pattern, this creates a critical reliability problem:
// ❌ RISKY CODE (Before Outbox Pattern)
func CreateUser(user User) error {
    // Step 1: Save to database
    db.Save(user)  // ✅ SUCCESS

    // Step 2: Publish event
    nats.Publish("UserCreated", event)  // ❌ FAILS (NATS is down)

    // Result: User exists in DB, but no one knows about it!
    // - Read model is NOT updated
    // - Welcome email is NOT sent
    // - Other services are NOT notified
}

Real-World Scenarios Where This Fails

  1. NATS Server is Down
    • User registration succeeds in MongoDB
    • Event publishing fails
    • Read model never gets updated
    • User can’t login (data inconsistency)
  2. Network Partition
    • Order is placed and saved
    • Network issue prevents event from reaching NATS
    • Inventory is never decremented
    • Payment is never processed
  3. Application Crash
    • User data is committed to MongoDB
    • Application crashes before publishing event
    • Event is lost forever

The Solution: Transactional Outbox Pattern

The Outbox Pattern ensures guaranteed event delivery by making database writes and event publishing atomic.
// ✅ SAFE CODE (With Outbox Pattern)
func CreateUser(user User) error {
    // Start PostgreSQL Transaction (via GORM)
    return db.Transaction(func(tx *gorm.DB) error {
        // Step 1: Save user to database
        if err := tx.Create(&user).Error; err != nil {
            return err
        }

        // Step 2: Save event to outbox table (SAME transaction)
        outboxEvent := OutboxEvent{
            EventType: "UserCreated",
            Payload:   user,
        }
        return tx.Create(&outboxEvent).Error
    })

    // Background worker will publish the event later
}

How It Works

Architecture Overview

┌─────────────────────────────────────────────────────┐
│  Application (Write Path)                           │
│                                                     │
│  1. UserService.Create()                            │
│     ├─> Save user to `users` table                  │
│     └─> Save event to `outbox_events` table         │
│                                                     │
│  ✅ Both succeed or both fail (Atomic)              │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│  Outbox Relay Worker (Background)                   │
│                                                     │
│  Every 1 second:                                    │
│  1. Poll `outbox_events` for unpublished events     │
│  2. Publish to NATS via JetStream                   │
│  3. Mark as published                               │
│  4. Retry on failure (max 3 attempts)               │
│  5. Log dead letters (failed after 3 attempts)      │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│  NATS JetStream                                     │
│  → Event Consumers                                  │
│  → Read Model Updates                               │
│  → Side Effects (emails, notifications)              │
└─────────────────────────────────────────────────────┘

Components

1. OutboxEvent Domain Model

type OutboxEvent struct {
    ID          string     // Unique event ID
    AggregateID string     // Entity ID (UserID, OrderID)
    EventType   string     // "UserCreated", "OrderPlaced"
    Payload     []byte     // JSON event data
    CreatedAt   time.Time  // When event was created
    Published   bool       // Has it been published?
    PublishedAt *time.Time // When was it published?
    Attempts    int        // Retry counter
    LastError   string     // Last error message
}

2. OutboxRepository

  • Save() - Store event (called within transaction)
  • FindUnpublished() - Get pending events
  • MarkPublished() - Update status after successful publish
  • IncrementAttempts() - Track retries
  • DeletePublished() - Cleanup old events

3. Outbox Relay Worker

  • Runs in background goroutine
  • Polls every 1 second
  • Publishes events to NATS
  • Automatic retry (max 3 attempts)
  • Dead letter logging for failed events

Benefits

1. Guaranteed Delivery

  • Events are never lost
  • Even if NATS is down for hours, events will be published when it comes back up

2. Atomicity

  • Database write and event creation happen in the same transaction
  • No partial failures

3. Automatic Retry

  • Failed events are automatically retried
  • No manual intervention needed

4. Audit Trail

  • Complete history of all events in the database
  • Can query to see pending/failed events

5. Graceful Degradation

  • System continues to work even if message broker is down
  • Events are queued and published later

Trade-offs

Pros

  • Zero data loss - Events guaranteed to be published
  • Fault tolerance - Works even when NATS is down
  • Automatic retry - No manual recovery needed
  • Audit trail - Complete event history
  • Simple - ~200 lines of code

Cons

  • ⚠️ Eventual consistency - 1-2 second delay before events are published
  • ⚠️ Storage overhead - Outbox table grows over time (needs cleanup)
  • ⚠️ Complexity - One more component to manage

Implementation Details

Outbox Pattern Example

Monitoring & Operations

Query Pending Events

# MongoDB Shell
db.outbox_events.find({ published: false })

Query Failed Events (Dead Letters)

db.outbox_events.find({
    published: false,
    attempts: { $gte: 3 }
})

Cleanup Old Events

The system automatically deletes published events older than 7 days:
// Runs daily
outboxRepo.DeletePublished(ctx, 7)

Testing the Implementation

1. Normal Flow (NATS is Up)

# Register a user
curl -X POST http://localhost:8080/api/v1/auth/register \
  -d '{"email":"test@example.com","password":"pass123","name":"Test","roles":["buyer"]}'

# Check logs - you should see:
# INFO  user created and event saved to outbox
# INFO  event published successfully

2. Failure Scenario (NATS is Down)

# Stop NATS
docker stop go_nats

# Register a user (still works!)
curl -X POST http://localhost:8080/api/v1/auth/register \
  -d '{"email":"test2@example.com","password":"pass123","name":"Test2","roles":["buyer"]}'

# Check logs:
# INFO  user created and event saved to outbox  ✅
# ERROR failed to publish event (will retry)

# Start NATS
docker start go_nats

# Wait 1-2 seconds, check logs:
# INFO  event published successfully  ✅

When to Use Outbox Pattern

✅ Use It When:

  • Events are critical (user registration, payment, order)
  • You need guaranteed delivery
  • You can tolerate 1-2 second delay
  • You’re using a database with transaction support (MongoDB Replica Set)

❌ Skip It When:

  • Events are non-critical (page view, analytics)
  • You need real-time delivery (< 100ms)
  • You’re okay with occasional data loss
  • You don’t have transaction support

Future Enhancements

  1. Batch Publishing - Publish multiple events at once for better throughput
  2. Priority Queue - Critical events published first
  3. Exponential Backoff - Smarter retry strategy
  4. Metrics - Prometheus metrics for monitoring
  5. Admin UI - View and republish dead letters

Conclusion

The Transactional Outbox Pattern is a production-ready solution for guaranteed event delivery in distributed systems. It ensures that your B2B eCommerce platform maintains data consistency even during failures, providing a reliable foundation for event-driven architecture.