Lean observability: logs, metrics and SLOs for small teams
Observability isn't «pipe everything to Elastic and hope». For lean teams, the focus is: can you answer «is the system degraded for users?» in minutes with data you already have.
Structured logs (JSON with request ID, anonymized user ID when relevant) cost little in code and save hours in production. Red-flag metrics — p95 latency, error rate, job queue depth — give service health without infinite dashboards.
A simple SLO, e.g. «99% of checkout requests < 2s over 30 days», aligns business and engineering. Error budget spent: prioritize reliability; budget left: you can invest in features.
Alerts should wake someone with a clear next step. If the runbook is «check tomorrow», the alert is noise. Start with few, tuned alerts, and expand when instrumentation debt is paid down.
Interested in this topic? Talk to us about your context — we adapt stack and process to the product.
← Back to blog