Building Observable Go Services

When a request slows down in a distributed system, "the API is slow" is not a diagnosis — it is the start of an investigation. Observability is what turns that investigation from guesswork into a few targeted queries. Over the years I have settled on a small set of patterns for making Go services observable without drowning them in instrumentation.

The three signals

Observability is usually framed as traces, metrics, and logs. They answer different questions:

Metrics answer "is something wrong, and how bad?" — they are cheap, aggregate, and great for alerting.
Traces answer "where is the time going?" — they follow a single request across service boundaries.
Logs answer "what exactly happened?" — they carry the detail you need once a trace points you at the right span.

The mistake I see most often is treating these as three separate systems. They are far more powerful when correlated by a shared trace ID.

Start with OpenTelemetry

OpenTelemetry gives you a vendor-neutral API for all three signals. Wiring up a tracer provider once, at startup, keeps the rest of the code clean:

func initTracer(ctx context.Context) (func(context.Context) error, error) {
    exp, err := otlptracegrpc.New(ctx)
    if err != nil {
        return nil, fmt.Errorf("create otlp exporter: %w", err)
    }

    tp := trace.NewTracerProvider(
        trace.WithBatcher(exp),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName("loan-screening"),
        )),
    )
    otel.SetTracerProvider(tp)
    return tp.Shutdown, nil
}

From there, instrument the boundaries that matter — inbound handlers, outbound calls, and any expensive work in between:

func (s *Service) Screen(ctx context.Context, app Application) (Decision, error) {
    ctx, span := otel.Tracer("screening").Start(ctx, "Screen")
    defer span.End()

    span.SetAttributes(attribute.String("applicant.segment", app.Segment))
    // ... real work ...
    return decision, nil
}

Make logs carry the trace ID

A trace is only useful if you can jump from a log line to it. With structured logging via slog, attach the trace ID to every log emitted inside a request:

func logger(ctx context.Context) *slog.Logger {
    sc := trace.SpanContextFromContext(ctx)
    if !sc.HasTraceID() {
        return slog.Default()
    }
    return slog.Default().With("trace_id", sc.TraceID().String())
}

Now a single trace_id ties together your Prometheus exemplars, your Jaeger trace, and your Loki logs. That correlation is the whole point.

Keep metrics low-cardinality

Metrics are cheap until you label them with something unbounded — a user ID, a request path with IDs in it, a raw error string. Cardinality explosions are the most common way teams accidentally take down their own monitoring stack.

Rule of thumb: a label is safe only if you could enumerate its possible values on a whiteboard.

Use a small, fixed set of labels (method, route template, status class) and push the high-cardinality detail into traces and logs instead.

What good looks like

A healthy setup lets you go from an alert to a root cause in three hops:

A Prometheus alert fires on elevated p99 latency.
An exemplar on that metric links to a slow trace in Jaeger.
The slow span's trace_id pulls the exact logs in Loki.

No SSH-ing into boxes, no grep across hosts. That is the difference between observability as a buzzword and observability as a tool you actually reach for at 2 a.m.

Arham Abiyan — Backend / Software Engineer

Experience

Skills

Writing

Contact

The three signals

Start with OpenTelemetry

Make logs carry the trace ID

Keep metrics low-cardinality

What good looks like