BEBuzen ESB Docs
SRE Playbook

Operations and Troubleshooting

This runbook focuses on practical day-2 operations: confirming service health, triaging routing failures, recovering from deployment issues, and restoring service safely.

1. Health and readiness checks

curl -ksS https://<esb-host>:8443/api/v1/system/health \
  -H "Authorization: Bearer ${TOKEN}" | jq .

curl -ksS https://<esb-host>:8443/api/v1/system/info \
  -H "Authorization: Bearer ${TOKEN}" | jq .

curl -ksS https://<esb-host>:8443/api/v1/routes \
  -H "Authorization: Bearer ${TOKEN}" | jq .
  • Confirm all critical routes are Started.
  • Check app status for unexpected FAILED or STOPPED states.
  • Verify connector stats and health for dependencies (JMS/DB/etc.).

2. Logs and traces workflow

# Recent logs
curl -ksS "https://<esb-host>:8443/api/v1/logs?level=ERROR&lines=200" \
  -H "Authorization: Bearer ${TOKEN}" | jq .

# Tail logs
curl -ksS "https://<esb-host>:8443/api/v1/logs/tail?lines=200" \
  -H "Authorization: Bearer ${TOKEN}" | jq .

# Route-specific traces
curl -ksS "https://<esb-host>:8443/api/v1/traces/route/order-processing" \
  -H "Authorization: Bearer ${TOKEN}" | jq .

Use trace correlation and exchange endpoints to follow a single request path through processors and sinks.

3. Metrics and alerts

curl -ksS https://<esb-host>:8443/api/v1/metrics \
  -H "Authorization: Bearer ${TOKEN}" | jq .

curl -ksS https://<esb-host>:8443/api/v1/alerts/summary \
  -H "Authorization: Bearer ${TOKEN}" | jq .

curl -ksS https://<esb-host>:8443/api/v1/alerts/active \
  -H "Authorization: Bearer ${TOKEN}" | jq .

Investigate sudden changes in throughput, error counts, and per-route stats before taking restart actions.

4. Backup and restore

System backup endpoints are exposed under /api/v1/system/backup*:

# Download backup
curl -ksS -o buzen-backup.zip \
  https://<esb-host>:8443/api/v1/system/backup \
  -H "Authorization: Bearer ${TOKEN}"

# Upload restore bundle
curl -ksS -X POST https://<esb-host>:8443/api/v1/system/backup/restore \
  -H "Authorization: Bearer ${TOKEN}" \
  -F "file=@buzen-backup.zip" | jq .

# Poll restore status
curl -ksS https://<esb-host>:8443/api/v1/system/backup/restore-status \
  -H "Authorization: Bearer ${TOKEN}" | jq .

5. Incident runbook

  1. Capture symptom baseline: health, route list, recent alerts, error logs.
  2. Identify blast radius: app-specific vs platform-wide.
  3. If limited to one app, stop and redeploy the app from known-good archive.
  4. If a recent route update triggered failure, rollback app version.
  5. Validate recovery via route status, traces, and throughput metrics.
  6. Document root cause and add prevention controls (policy, validation, tests).

6. Common failure patterns and fixes

SymptomLikely causeAction
App deployment fails with conflictRoute ID or ingress path collisionRename route ID/path, redeploy
401 on API callsMissing/expired tokenRe-authenticate and retry
403 on admin endpointInsufficient roleUse ADMIN account or adjust role
DB node fails at runtimeInvalid datasource reference/credentialsTest data source via API and update config
JMS consumer idle unexpectedlyApp or route stopped, or broker unavailableCheck route state and connector health
High memory and trace growthAggressive trace/body capture settingsTune retention and capture configuration
For deployment-specific failures, use Deployment and Packaging. For API errors, use Management API Guide.