Operations and Troubleshooting
This runbook focuses on practical day-2 operations: confirming service health, triaging routing failures, recovering from deployment issues, and restoring service safely.
1. Health and readiness checks
curl -ksS https://<esb-host>:8443/api/v1/system/health \
-H "Authorization: Bearer ${TOKEN}" | jq .
curl -ksS https://<esb-host>:8443/api/v1/system/info \
-H "Authorization: Bearer ${TOKEN}" | jq .
curl -ksS https://<esb-host>:8443/api/v1/routes \
-H "Authorization: Bearer ${TOKEN}" | jq .
- Confirm all critical routes are Started.
- Check app status for unexpected FAILED or STOPPED states.
- Verify connector stats and health for dependencies (JMS/DB/etc.).
2. Logs and traces workflow
# Recent logs
curl -ksS "https://<esb-host>:8443/api/v1/logs?level=ERROR&lines=200" \
-H "Authorization: Bearer ${TOKEN}" | jq .
# Tail logs
curl -ksS "https://<esb-host>:8443/api/v1/logs/tail?lines=200" \
-H "Authorization: Bearer ${TOKEN}" | jq .
# Route-specific traces
curl -ksS "https://<esb-host>:8443/api/v1/traces/route/order-processing" \
-H "Authorization: Bearer ${TOKEN}" | jq .
Use trace correlation and exchange endpoints to follow a single request path through processors and sinks.
3. Metrics and alerts
curl -ksS https://<esb-host>:8443/api/v1/metrics \
-H "Authorization: Bearer ${TOKEN}" | jq .
curl -ksS https://<esb-host>:8443/api/v1/alerts/summary \
-H "Authorization: Bearer ${TOKEN}" | jq .
curl -ksS https://<esb-host>:8443/api/v1/alerts/active \
-H "Authorization: Bearer ${TOKEN}" | jq .
Investigate sudden changes in throughput, error counts, and per-route stats before taking restart actions.
4. Backup and restore
System backup endpoints are exposed under /api/v1/system/backup*:
# Download backup
curl -ksS -o buzen-backup.zip \
https://<esb-host>:8443/api/v1/system/backup \
-H "Authorization: Bearer ${TOKEN}"
# Upload restore bundle
curl -ksS -X POST https://<esb-host>:8443/api/v1/system/backup/restore \
-H "Authorization: Bearer ${TOKEN}" \
-F "file=@buzen-backup.zip" | jq .
# Poll restore status
curl -ksS https://<esb-host>:8443/api/v1/system/backup/restore-status \
-H "Authorization: Bearer ${TOKEN}" | jq .
5. Incident runbook
- Capture symptom baseline: health, route list, recent alerts, error logs.
- Identify blast radius: app-specific vs platform-wide.
- If limited to one app, stop and redeploy the app from known-good archive.
- If a recent route update triggered failure, rollback app version.
- Validate recovery via route status, traces, and throughput metrics.
- Document root cause and add prevention controls (policy, validation, tests).
6. Common failure patterns and fixes
| Symptom | Likely cause | Action |
|---|---|---|
| App deployment fails with conflict | Route ID or ingress path collision | Rename route ID/path, redeploy |
| 401 on API calls | Missing/expired token | Re-authenticate and retry |
| 403 on admin endpoint | Insufficient role | Use ADMIN account or adjust role |
| DB node fails at runtime | Invalid datasource reference/credentials | Test data source via API and update config |
| JMS consumer idle unexpectedly | App or route stopped, or broker unavailable | Check route state and connector health |
| High memory and trace growth | Aggressive trace/body capture settings | Tune retention and capture configuration |
For deployment-specific failures, use Deployment and Packaging. For API errors, use Management API Guide.