Troubleshooting

Authentication Issues

JWT Validation Fails

  • Status: 401 Unauthorized
  • Cause: Invalid signature, expired token, or incorrect secret.
  • Check:
    • Verify that BETTER_AUTH_SECRET in .env matches the secret used in the Bun monolith.
    • Check the token expiration (exp claim) using jwt.io.
    • Ensure the aud (audience) and iss (issuer) match the expected values in middleware/better_auth_validator.go.

Cache Inconsistency

  • Problem: User roles changed in the database but aren’t reflecting in the API.
  • Cause: Redis user_status cache hasn’t expired (5-min TTL) or the NATS sync event was missed.
  • Resolution:
    • Manually flush the user’s cache: redis-cli DEL "user_status:<auth_id>"
    • Verify the NATS consumer go-user-sync is running: nats consumer info user_sync go-user-sync

Infrastructure Issues

NATS: Invalid Stream Name

  • Cause: Attempting to use a topic name with dots (e.g., user.sync) as a JetStream stream name.
  • Fix: Our SubjectCalculator maps the internal internal topic user_sync to the subject user.sync. Ensure subscriptions use user_sync.

PostgreSQL: Missing Column Errors

  • Problem: ERROR: column "attempts" does not exist.
  • Cause: New fields were added to the model but AutoMigrate hasn’t been run.
  • Fix: Run go run cmd/migrate/main.go.

Redis: Connection Refused

  • Behavior: The backend will log warnings but continue to work by falling back to direct PostgreSQL queries.
  • Check: Ensure Redis is running and the REDIS_URL is correct.

NATS Sync Issues

Events Not Reaching Go Backend

  1. Check Bun Logs: Is the monolith successfully publishing to NATS?
  2. Check NATS Server: nats sub "user.sync" for any activity.
  3. Check Go Logs: Look for “Received user sync event” or error messages from user_sync_handler.go.
  4. JetStream Status: If using durable consumers, verify they aren’t stuck due to too many failed retries.

Error Reporting

When reporting a new issue, please include:
  • The trace_id from the HTTP response header.
  • Relevant logs from the period around the error.
  • The output of the /health endpoint.