No description
  • Go 98.3%
  • Shell 1.6%
Find a file
StefanSA 6572d2ef82
All checks were successful
ci / go-test (push) Successful in 10m51s
ci / loki-reference-diff (push) Successful in 10m54s
Migrate release workflow to Forgejo
2026-06-23 21:22:47 +02:00
.forgejo/workflows Migrate release workflow to Forgejo 2026-06-23 21:22:47 +02:00
cmd/obswrapper handlers: add tracing instrumentation and drilldown query fix 2026-03-04 14:06:25 +01:00
docs docs: track grafana 12.4 compatibility regression 2026-04-12 12:32:33 +02:00
internal loki: fix syslog drilldown compatibility gaps 2026-04-23 10:59:21 +02:00
scripts scripts: track tempo signoff and fanout benchmark harness 2026-03-04 09:06:04 +01:00
.gitignore release: tempo compatibility milestone 2026-04-04 17:07:04 +02:00
.goreleaser.yaml Migrate release workflow to Forgejo 2026-06-23 21:22:47 +02:00
CHANGELOG.md changelog: add v5.18.4 release notes 2026-04-23 11:36:55 +02:00
CODE_OF_CONDUCT.md Initial import 2026-02-28 13:23:08 +01:00
CONTRIBUTING.md Initial import 2026-02-28 13:23:08 +01:00
DEPLOY_CHECKLIST.md ci(release): harden image publish with immutable refs 2026-03-05 10:50:38 +01:00
docker-compose.external.yml ci(release): harden image publish with immutable refs 2026-03-05 10:50:38 +01:00
docker-compose.yml Initial import 2026-02-28 13:23:08 +01:00
Dockerfile ci(release): harden image publish with immutable refs 2026-03-05 10:50:38 +01:00
go.mod fix(deps): update github.com/golang/snappy to v1.0.0 2026-04-16 22:43:41 +02:00
go.sum fix(deps): update github.com/golang/snappy to v1.0.0 2026-04-16 22:43:41 +02:00
LICENSE Initial import 2026-02-28 13:23:08 +01:00
NOTICE Initial import 2026-02-28 13:23:08 +01:00
README.md ci: standardize release notes and semver tag policy 2026-04-23 14:19:42 +02:00
renovate.json chore(renovate): enforce gomod tidy for go dependency PRs 2026-04-17 17:29:13 +02:00
SECURITY.md Initial import 2026-02-28 13:23:08 +01:00
WRAPPER_OVERVIEW.txt handlers: add tracing instrumentation and drilldown query fix 2026-03-04 14:06:25 +01:00

obswrapper

obswrapper is a dual-module observability API bridge for Grafana:

  • Loki-compatible logs module (Explore/Drilldown logs) backed by VictoriaLogs.
  • Tempo-compatible traces module (Explore/Drilldown traces) backed by VictoriaTraces.
  • Grafana Alerting History-compatible logs payload normalization.

Both modules are first-class parts of the same service and can run side-by-side on separate listeners (LISTEN_ADDR for logs, TRACE_LISTEN_ADDR for traces). The logs module reports Loki version 3.5.0 (for Grafana JSON filtering).

Native DS Status (Loki + Tempo)

  • Current implementation is near-native for prioritized Grafana Loki DS and Tempo DS flows.
  • The authoritative protocol/query status is tracked in docs/COMPATIBILITY.md.
  • Native Tempo scope includes search, tags, tag-values, trace-by-id, and metrics (query + query_range) for v1/v2 paths where applicable.
  • Trace-by-id payload hardening includes:
    • quality-weighted JSON/protobuf negotiation (Accept with q=) for /api/v2/traces/<traceID>,
    • deterministic ordering for mixed upstream payload ordering,
    • normalized status/reference/link/event conversion for Drilldown-facing stability.
  • Search/tag/tag-values paths include bounded parallel fanout plus short-lived/stale cache fallbacks, validated with workload-scale tests/benchmarks (TEMPO-316).
  • Sign-off script covers Tempo critical endpoints alongside Loki/Alerting History checks:
    • bash ./scripts/e2e/grafana_signoff.sh
    • optional skip when traces are intentionally unavailable:
      • TRACE_CHECKS_REQUIRED=false bash ./scripts/e2e/grafana_signoff.sh

UI Preview

Logs Drilldown:

Grafana Logs Drilldown via obswrapper

Traces Drilldown:

Grafana Traces Drilldown via obswrapper

Tempo Service Graph (Explore):

Grafana Tempo Service Graph via obswrapper

Alerting History:

Grafana Alerting History via obswrapper

Compatibility

  • Current logs/traces compatibility status is maintained in docs/COMPATIBILITY.md.
  • This matrix is the reference for supported, partial, and not supported behavior.

Assumptions

  • VictoriaLogs is reachable via VICTORIA_BASE_URL (default http://localhost:9428).
  • VictoriaTraces is reachable via VICTORIA_TRACES_BASE_URL (default http://localhost:10428).
  • Mapping logic will be extended based on real Grafana requests.
  • Log queries (streams) and core metric queries (e.g., count_over_time) are supported.
  • LogQL mapping covers stream selector + line filter + field filter (best-effort).
  • Parser stages: | json -> | unpack_json, | logfmt -> | unpack_logfmt (LogSQL).
  • Parser fields: | json foo,bar and | logfmt foo,bar -> fields (foo, bar).
  • Parser stage: | decolorize -> | decolorize.
  • Parser stages: | regexp -> | extract_regexp, | pattern -> | extract.
  • label_format and line_format are evaluated for constant values and simple templates.
  • drop and keep stage names are mapped to delete and keep for simple lists.
  • Unsupported LogQL stages return 400 bad_data with an explanatory message.
  • Metric-style LogQL wrappers (e.g., count_over_time) are normalized to extract selectors.
  • Optional ip() extension can map label filters to ipv4_range(...) for = and != when TRANSLATION_ENABLE_IP_EXTENSION=true.
  • Metric aggregations with unwrap are translated to VictoriaLogs stats queries.
  • Metric label group queries (e.g., sum(count_over_time(...)) by (job|host|level)) use stats endpoints first and fallback to hits/field-values when stats is empty.
  • Tempo/TraceQL responses are normalized for Grafana Traces Drilldown compatibility:
    • search span payload omits unsupported parser fields (traceId, parentSpanId, endTimeUnixNano).
    • structure-style searches emit exactly one spanSet per trace for merge stability.
    • selected attributes are normalized to Drilldown keys (service.name, exception.*).
  • detected_level metric queries fall back to stats queries over the stream selector when per-level fields are missing.
  • Grafana's auto matcher service_name=~".+" is stripped in index/label query paths to avoid hiding valid streams without service_name.
  • Grafana Alerting History payloads are normalized (labels/values and stream fields).
  • Loki push writes accept JSON and protobuf/snappy payloads.
  • Grafana health check vector(1)+vector(1) is answered as a Loki vector.
  • detected_level is mapped from detected_level, level, severity, log.level (fallback: unknown).
  • Metric responses mirror detected_level into level for Grafana label compatibility.

Configuration (Env)

  • LISTEN_ADDR (default :3100)
  • TRACE_LISTEN_ADDR (default empty, e.g. :3200) enables Tempo-compatible trace wrapper listener
  • VICTORIA_BASE_URL (default http://localhost:9428)
  • VICTORIA_TRACES_BASE_URL (default http://localhost:10428) VictoriaTraces upstream base URL
  • VICTORIA_AUTH (optional, e.g., Bearer ...)
  • VICTORIA_TENANT (optional, sets X-Scope-OrgID)
  • REQUEST_TIMEOUT_MS (default 30000)
  • TRANSLATION_ENGINE (default legacy) active LogQL translator engine (legacy or candidate).
  • TRANSLATION_SHADOW_ENGINE (default empty) optional shadow engine (legacy or candidate) for diff logging.
  • TRANSLATION_REFERENCE_ENDPOINT (default empty) optional reference translation service base URL used by candidate.
  • TRANSLATION_REFERENCE_TIMEOUT_MS (default 3000) timeout for reference translation requests.
  • TRANSLATION_ENABLE_IP_EXTENSION (default false) enables local non-reference ip() matcher translation.
  • PASS_THROUGH (default false) forwards query params 1:1
  • LOG_REQUESTS (default true) request logging on/off
  • LOG_REQUESTS_STDOUT (default true) log to stdout
  • LOG_REQUESTS_FILE (default /opt/obswrapper/requests.log) logfile
  • LOG_REQUESTS_BODY_BYTES (default 2048) body preview (bytes)
    • Includes translated queries (translated_query) and Victoria params.
    • Includes translator shadow diffs (translator_diff) when shadowing is enabled.
  • DETECTED_FIELDS_LIMIT (default 50) default limit for detected labels/fields and detected field values (when request limit is not set)
  • MIN_STEP_SECONDS (default 2) minimum step in seconds
  • LARGE_RANGE_SECONDS (default 21600) threshold for large time ranges
  • MIN_STEP_LARGE_SECONDS (default 5) minimum step for large time ranges
  • HITS_CACHE_TTL_MS (default 7000) cache TTL for /select/logsql/hits
  • LOKI_FIELD_PASSTHROUGH_MODE (default off) optional logs response field passthrough mode: off|selected|all
  • LOKI_FIELD_PASSTHROUGH_KEYS (default empty) comma-separated allowlist used when mode=selected
  • LOKI_FIELD_PASSTHROUGH_MAX_FIELDS (default 24) max metadata fields per returned log line
  • LOKI_FIELD_PASSTHROUGH_MAX_VALUE_BYTES (default 256) max bytes per metadata value (UTF-8 safe truncation)
  • LOKI_FIELD_PASSTHROUGH_MAX_TOTAL_BYTES (default 2048) max total metadata payload bytes per returned log line
  • LOKI_FIELD_PASSTHROUGH_REQUIRE_PARSER (default true) only enable passthrough when parser-like stage exists (for example | json/| logfmt)
  • LOKI_FIELD_PASSTHROUGH_VALUE_META (default false) emit passthrough fields as Loki categorized tuple metadata ([ts, line, {"parsed":{...}}]) with data.encodingFlags=["categorize-labels"] instead of merging fields into line JSON
  • TRACE_SPANS_ENABLED (default false) enable OpenTelemetry tracing for inbound HTTP requests and upstream calls
  • TRACE_SPANS_SAMPLE_RATIO (default 1.0) trace sampler ratio (0..1)
  • TEMPO_OBSERVED_METRICS_ENABLED (default true) enables trace-fanout-backed Tempo metrics compatibility responses; set false to force fast synthetic compatibility series for Drilldown/Explore stability
  • OTEL_SERVICE_NAME (default obswrapper) OpenTelemetry service name
  • OTEL_EXPORTER_OTLP_TRACES_ENDPOINT / OTEL_EXPORTER_OTLP_ENDPOINT (default empty) OTLP HTTP endpoint for trace export
  • OTEL_EXPORTER_OTLP_INSECURE (default true) use insecure OTLP HTTP transport

Run

cd <repo-root>
go run ./cmd/obswrapper

Docker

cd <repo-root>
docker compose up --build

Docker (external VictoriaLogs)

cd <repo-root>
docker compose -f docker-compose.external.yml up

Traces Module (Grafana Drilldown Traces)

Run obswrapper with the trace listener enabled:

TRACE_LISTEN_ADDR=:3200 \
VICTORIA_TRACES_BASE_URL=http://victoriatraces:10428 \
go run ./cmd/obswrapper

Grafana Tempo datasource should point to wrapper trace port:

  • URL: http://<wrapper-host>:3200
  • Access mode: server-side default

Tempo Service Graph (Explore)

To enable Explore -> Tempo -> Service Graph, both metric production and datasource wiring must be in place:

  1. Produce trace-derived metrics in your OTLP pipeline:
    • traces_service_graph_*
    • traces_spanmetrics_*
  2. Store these metrics in a Prometheus-compatible datasource (for example VictoriaMetrics).
  3. For each active Grafana Tempo datasource used in Explore, set:
    • jsonData.serviceMap.datasourceUid to the metrics datasource UID.

VictoriaFlow runtime reference (/daten/victoriaflow):

  • Reference Alloy config: /daten/victoriaflow/config/alloy/alloy.alloy.
  • Alloy generates service-graph/spanmetrics via otelcol.connector.servicegraph and otelcol.connector.spanmetrics, then remote-writes to VictoriaMetrics.
  • Tempo datasources Tempo and VictoriaTraces both map serviceMap.datasourceUid to VictoriaMetrics UID.
  • Multi-host requirement: every host that ingests traces must run the same trace-derived metrics pipeline. If one host misses this Alloy pipeline, Service structure / Root cause errors can show No data for traces from that host.

Minimal Alloy reference (from /daten/victoriaflow/config/alloy/alloy.alloy):

otelcol.exporter.otlphttp "default" {
  client {
    endpoint = "http://victoriatraces:10428/insert/opentelemetry"
  }
}

prometheus.remote_write "victoriametrics" {
  endpoint {
    url = "http://victoriametrics:8428/api/v1/write"
  }
}

otelcol.exporter.prometheus "traces_metrics" {
  forward_to = [prometheus.remote_write.victoriametrics.receiver]
}

otelcol.connector.servicegraph "default" {
  output {
    metrics = [otelcol.exporter.prometheus.traces_metrics.input]
  }
}

otelcol.connector.spanmetrics "default" {
  namespace = "traces.spanmetrics"

  output {
    metrics = [otelcol.exporter.prometheus.traces_metrics.input]
  }
}

otelcol.receiver.otlp "receiver" {
  output {
    traces = [
      otelcol.exporter.otlphttp.default.input,
      otelcol.connector.servicegraph.default.input,
      otelcol.connector.spanmetrics.default.input,
    ]
  }
}

Quick checks:

  • Prometheus query returns data: sum(rate(traces_service_graph_request_total[5m]))
  • Prometheus query returns data: sum(rate(traces_spanmetrics_calls_total[5m]))
  • Grafana datasource API shows Tempo datasource jsonData.serviceMap.datasourceUid populated.

Enable Grafana Alerting History

Example (/grafana/provisioning/grafana.ini):

[feature_toggles]
enable = grafanaManagedRecordingRules, alertingMigrationUI, grafanaManagedRecordingRulesDatasources, alertStateHistoryLokiPrimary, alertStateHistoryLokiOnly, alertingCentralAlertHistory

[unified_alerting.state_history]
enabled = true
backend = "loki"
loki_remote_read_url = http://obswrapper:3100
loki_remote_write_url = http://obswrapper:3100/insert

Differential Testing

Run local conformance tests:

cd <repo-root>
go test ./internal/translate -run TestTranslatorConformanceCorpus

Run differential tests against a running reference translator API:

cd <repo-root>
LOGQL_REFERENCE_ENDPOINT=http://127.0.0.1:8080 go test ./internal/translate -run TestTranslatorDifferentialAgainstReferenceEndpoint

Run only critical differential cases:

cd <repo-root>
LOGQL_REFERENCE_ENDPOINT=http://127.0.0.1:8080 \
LOGQL_REFERENCE_SCOPE=critical \
go test ./internal/translate -run TestTranslatorDifferentialAgainstReferenceEndpoint

Enable shadow mode at runtime (legacy executes, candidate only logs diffs):

TRANSLATION_ENGINE=legacy \
TRANSLATION_SHADOW_ENGINE=candidate \
TRANSLATION_REFERENCE_ENDPOINT=http://127.0.0.1:8080 \
go run ./cmd/obswrapper

Next Steps

  • Prioritized product roadmap: docs/PRODUCT_ROADMAP.md.
  • Run grafana-lokiexplore-app locally and capture real requests.
  • Map additional LogQL operators as needed (e.g., pipeline functions).
  • If VictoriaLogs needs different endpoints, adjust routes in internal/handlers/loki.go.

Implemented Loki Endpoints

  • /loki/api/v1/query
  • /loki/api/v1/query_range
  • /loki/api/v1/labels
  • /loki/api/v1/label/<name>/values
  • /loki/api/v1/series
  • /loki/api/v1/index/volume
  • /loki/api/v1/index/volume_range
  • /loki/api/v1/index/stats
  • /loki/api/v1/patterns (heuristic extraction path, not full Loki pattern-engine parity)
  • /loki/api/v1/detected_labels
  • /loki/api/v1/detected_fields
  • /loki/api/v1/detected_field/<name>/values
  • /loki/api/v1/drilldown-limits
  • /loki/api/v1/status/config
  • /loki/api/v1/status/limits
  • /insert/loki/api/v1/push

Implemented Trace Endpoints (Tempo-compatible shim)

  • /api/search
  • /api/v2/search
  • /api/search/tags
  • /api/v2/search/tags
  • /api/search/tag/<name>/values
  • /api/v2/search/tag/<name>/values
  • /api/traces/<traceID>
  • /api/v2/traces/<traceID>
  • /api/metrics/query_range
  • /api/v2/metrics/query_range
  • /api/metrics/query
  • /api/v2/metrics/query
  • fallback passthrough for unknown /api/* paths to VictoriaTraces Jaeger APIs

Release

  • CI gates run via .forgejo/workflows/ci.yml (full test suite + reference differential + Loki phase compatibility gates).
  • The reference differential CI checks run when LOGQL_REFERENCE_ENDPOINT (repo variable or secret) is configured and exposes POST /api/v1/logql-to-logsql; otherwise those checks are skipped.
  • Releases are created from git tags via .forgejo/workflows/release.yml.
  • Releases are CI-owned and idempotent (see docs/release/RELEASE_WORKFLOW.md).
  • The Forgejo/Codeberg release body is published from the matching CHANGELOG.md section for the tagged version.
  • Stable release tags use vX.Y.Z; release candidates use vX.Y.Z-rcN only when explicitly justified; date-based tags are historical only and must not be created going forward.
  • Release workflow keeps GoReleaser on current config-compatible version and installs it with GOTOOLCHAIN=auto to stay runner-compatible when hosted runners lag behind module minimum Go version.
  • Container images are pushed as immutable refs: :<tag> and :sha-<commit12> (plus digest metadata in release assets).
  • :latest is updated only for direct tag-push releases that point at current default-branch HEAD; manual reruns do not move :latest.
  • Deployment should use release tag or digest (container-image-digest.txt), not mutable :latest.
  • Artifacts include tar.gz, deb, rpm, checksums, and container image metadata.
  • Details: see CHANGELOG.md

License

Apache-2.0. See LICENSE and NOTICE.