No description
  • Go 98.7%
  • Python 0.8%
  • Shell 0.3%
  • Dockerfile 0.2%
Find a file
StefanSA f56b45232a
All checks were successful
ci / go-test (push) Successful in 10m48s
ci / release-smoke (push) Successful in 15m27s
ci: migrate release publishing to Forgejo
2026-06-23 19:57:51 +02:00
.forgejo/workflows ci: migrate release publishing to Forgejo 2026-06-23 19:57:51 +02:00
.githooks Initial import for Codeberg 2026-04-08 15:14:52 +02:00
cmd Prepare v1.2.4 release for neutral ML topic 2026-04-09 15:08:13 +02:00
configs Prepare v1.2.4 release for neutral ML topic 2026-04-09 15:08:13 +02:00
docs Prepare v1.2.4 release for neutral ML topic 2026-04-09 15:08:13 +02:00
internal Prepare v1.2.4 release for neutral ML topic 2026-04-09 15:08:13 +02:00
scripts chore: remove local governance from publish surface 2026-06-23 14:45:00 +02:00
testdata Prepare v1.2.4 release for neutral ML topic 2026-04-09 15:08:13 +02:00
.dockerignore Align publish surface ignore policy 2026-04-11 22:56:06 +02:00
.gitignore chore: remove local governance from publish surface 2026-06-23 14:45:00 +02:00
.goreleaser.yaml Add Forgejo CI and release automation 2026-04-08 15:24:01 +02:00
CHANGELOG.md ci: make release creation idempotent (skip if release exists) 2026-04-23 13:56:29 +02:00
Dockerfile Update docker base images 2026-04-17 15:59:22 +00:00
go.mod Update module github.com/oschwald/geoip2-golang to v2 2026-04-17 15:59:49 +00:00
go.sum Update module github.com/oschwald/geoip2-golang to v2 2026-04-17 15:59:49 +00:00
Makefile Initial import for Codeberg 2026-04-08 15:14:52 +02:00
README.md ci: standardize release notes and semver tag policy 2026-04-23 14:18:57 +02:00
renovate.json chore(renovate): run gomod tidy after go updates 2026-04-17 17:40:06 +02:00

syslog-ecs-analyzer

Dynamic syslog-to-ECS analyzer with explicit stage boundaries, best-effort parsing/decoding, modular enrichment, and conservative normalization.

Scope

Implemented now:

  • explicit internal stage models
    • RawEvent
    • EnvelopeEvent
    • ParsedPayloadEvent
    • ResolvedEvent
    • NormalizedEvent
  • modular package skeleton under internal/
  • pipeline interfaces and interface-driven orchestration
  • static inventory config loading
  • best-effort RFC3164 parsing
  • best-effort RFC5424 parsing
  • payload classification: JSON, key=value, unknown
  • generic JSON decoding
  • generic key=value decoding
  • generic resolver for observed identity, payload identity, and network fields
  • inventory enrichment for missing canonical identity fields
  • GeoIP/ASN enrichment for well-identified IPs
  • DNS reverse enrichment as additive metadata
  • IP reputation annotation as enrichment metadata
  • dynamic VictoriaLogs stream-field templates with nested field resolution
  • worker-based DNS enrichment with positive and negative cache control
  • cached IP reputation lookups with optional hot reload
  • conservative ECS-style normalization with provenance retention
  • table-driven Sophos XGS semantic ECS mapping with strict delete-on-success pruning of promoted vendor.payload fields
  • production-grade Kafka output mode for downstream anomaly-scorer / ML workflows
  • per-stage partial/failure status fields
  • focused unit tests for precedence, provenance, enrichment, normalization, and Kafka keying

Not implemented yet:

  • vendor-specific decoder packs or rule-engine logic

Architecture boundary

This project is the semantic/ECS intelligence layer, not the production detection engine.

In-project responsibilities:

  • parsing and decoding
  • generic semantic interpretation
  • ECS normalization and semantic lifting
  • explainability and confidence
  • family classification and generic alias handling
  • optional AI/ML/LLM-assisted semantic inference for field meaning
  • offline corpus analysis, feature export, baseline export, and scoring research
  • Kafka producer contract preparation for downstream consumers

Downstream responsibilities via Kafka:

  • anomaly scoring
  • temporal/contextual baselines in production
  • stateful behavior modeling
  • thresholding and detection logic
  • retraining
  • alerting

Decision rule:

  • if the logic answers "what does this field or event mean semantically?" it belongs here
  • if the logic answers "given a semantically understood stream, is this behavior anomalous over time/context?" it belongs in the external anomaly-scorer

The explicit architecture decision is documented in docs/architecture/adr_semantic_engine_vs_external_anomaly_scorer.md.

Repository layout

  • internal/input
  • internal/syslog
  • internal/classifier
  • internal/decoder
  • internal/registry
  • internal/resolver
  • internal/enrich
  • internal/ecs
  • internal/provenance
  • internal/output
  • internal/model
  • internal/pipeline
  • testdata/fixtures
  • docs/architecture

Third-Party References

This project uses publicly available log samples and reference data from the Elastic integrations repository for validation and testing purposes.

See docs/THIRD_PARTY.md for details.

Config

Example skeleton config: configs/syslog-ecs-analyzer.yaml Full image config: configs/syslog-ecs-analyzer.full.yaml

Example static inventory: configs/inventory.example.yaml

Current config surface:

  • service.name
  • inventory.static_path
  • registry.inventory_path
  • input.udp.*
  • input.tcp.*
  • input.replay.*
  • enrichment.geoip.*
  • enrichment.dns.*
  • enrichment.reputation.*
  • outputs.include_syslog_analyzer
  • outputs.file.*
  • outputs.stdout.*
  • outputs.victorialogs.*
  • outputs.kafka.*
  • pipeline.*
  • observability.listen_addr

Output Filtering (syslog_analyzer)

syslog_analyzer.* is the analyzer-owned metadata layer. It carries internal semantic status, provenance, explainability, semantic-family hints, and non-ECS canonical extension data under syslog_analyzer.canonical_ext.*.

It exists because the analyzer keeps internal reasoning explicit instead of silently flattening it into ECS or vendor payload. That metadata is useful for:

  • deterministic normalization flow
  • provenance and explainability
  • canonical extension handling
  • audit and validation work
  • offline corpus, ML feature export, and suggestion workflows

Some environments still prefer a cleaner emitted JSON shape. For that case, final output emission can suppress the top-level syslog_analyzer object:

outputs:
  include_syslog_analyzer: false

Behavior:

  • true or omitted: emit syslog_analyzer.* exactly as before
  • false: omit only the top-level syslog_analyzer object from emitted JSON

Important boundary:

  • this is output filtering only
  • parsing is unchanged
  • normalization is unchanged
  • ECS mapping is unchanged
  • canonical extension decisions are unchanged
  • cleanup and full-equivalence pruning are unchanged
  • offline ML and suggestion logic are unchanged

Internal computation still happens even when output filtering is disabled. This matters because other final-output features can still resolve from internal analyzer fields before suppression. For example, a VictoriaLogs stream field such as network_scope={syslog_analyzer.network.scope} can still be populated while the emitted event body omits syslog_analyzer.

Testing

The host environment does not have Go installed, so tests run in Docker:

make test

For local guardrails, enable the tracked hooks with git config core.hooksPath .githooks.

Runtime

The repository now includes a runnable analyzer binary at cmd/syslog-ecs-analyzer plus a container image Dockerfile. The same image contains all supported runtime capabilities; features are enabled or disabled by configuration. The minimal runtime surface is:

  • UDP syslog listener
  • TCP line-oriented syslog listener
  • one-shot replay input for seeded test evidence
  • file output
  • stdout output
  • VictoriaLogs JSONLine output
  • optional Kafka output for downstream anomaly-scoring / ML consumers
  • /healthz and /status observability endpoints

Safe output defaults:

  • outputs.victorialogs.enabled=true
  • outputs.kafka.enabled=false
  • outputs.stdout.enabled=false
  • outputs.file.enabled=false

No event stream is written to stdout or disk unless that output is explicitly enabled. If stdout or file output is enabled, the process logs a startup warning.

outputs.include_syslog_analyzer controls only final emitted JSON. When set to false, the top-level syslog_analyzer object is omitted from emitted output, but internal normalization, canonical extension handling, cleanup decisions, and offline tooling inputs remain unchanged.

This behavior is validated in the real test environment:

  • the running service can emit events without top-level syslog_analyzer
  • ECS fields still appear as before
  • full-equivalent raw cleanup still behaves the same
  • internal analyzer fields can still contribute to final stream-field rendering before the output object is filtered
  • no runtime ML, suggestion execution, or promotion execution is introduced by this filter

The isolated victoriaflow phase1 integration is documented in docs/testing/victoriaflow-test-integration.md. The recommended active victoriaflow runtime is documented in docs/testing/victoriaflow-full-image.md. Operator query examples for compact interface and trailing-context fields are in docs/queries/trailing_context_queries.md.

For /daten/victoriaflow, the explicit runtime transition strategy is:

  • keep syslog-ecs-analyzer-phase1 and syslog-ecs-analyzer-full as separate services
  • stop and remove phase1 before starting full
  • verify the active container name and image tag after cutover

This avoids ambiguous in-place replacement and keeps rollback straightforward.

Release Conventions

Public git release tags follow the v* scheme, for example v1.2.3.

Releases are CI-owned and idempotent (see docs/release/RELEASE_WORKFLOW.md). The Forgejo/Codeberg release body is published from the matching CHANGELOG.md section for the tagged version. Stable release tags use vX.Y.Z; release candidates use vX.Y.Z-rcN only when explicitly justified; date-based tags are historical only and must not be created going forward.

When publishing or deploying immutable release images, use the same version string for the container tag, for example:

  • git tag: v1.2.3
  • image tag: syslog-ecs-analyzer:v1.2.3

Phase or ad-hoc tags such as phase-* remain local or environment-specific rollout markers. They are not a substitute for public release tags.

Performance Improvements

The current production release keeps the same normalization behavior while reducing CPU and allocation pressure in the normalization hot path.

The main internal improvement is a shared payload leaf index with mutation invalidation:

  • repeated semantic leaf traversal is removed from the common lookup path
  • repeated per-lookup sorting of payload keys is removed
  • allocation and GC pressure are reduced substantially
  • throughput improved materially across the replay-audited datasets, with the largest gain on SonicWall-heavy compact-endpoint traffic

The performance work is behavior-preserving:

  • parsing semantics are unchanged
  • semantic mapping outcomes are unchanged
  • cleanup behavior is unchanged
  • structured trailing-context retention is unchanged
  • NO DUPLICATION if ECS equivalent exists remains enforced

Compact Endpoint Handling

The generic semantic layer now supports compact endpoint values in these shapes when the structure is safe:

  • IP:PORT
  • IP:PORT:TOKEN
  • IP:PORT:TOKEN:CONTEXT

Current behavior:

  • IP promotes to source.ip or destination.ip
  • PORT promotes to source.port or destination.port
  • the first trailing token promotes conservatively to:
    • source -> observer.ingress.interface.name
    • destination -> observer.egress.interface.name
  • one additional unmatched trailing context token is preserved explicitly under:
    • vendor.payload.src_compact.trailing_context
    • vendor.payload.dst_compact.trailing_context

This trailing context retention is non-ECS and intentionally semantically neutral. It preserves the unmatched compact suffix without claiming that it is a zone, segment, or other higher-level semantic meaning.

For operator-facing query examples against these fields, see docs/queries/trailing_context_queries.md.

Example:

src=192.168.0.19:60951:X0:STG-IT-PBU-01

Normalizes to:

  • source.ip=192.168.0.19
  • source.port=60951
  • observer.ingress.interface.name=X0
  • vendor.payload.src_compact.trailing_context=STG-IT-PBU-01

If additional trailing detail still remains unmatched, the original raw compact field is retained conservatively.

No Duplication Rule

The analyzer enforces a strict cleanup rule:

  • full equivalence -> raw/vendor field removed
  • partial equivalence -> raw/vendor field retained

This applies to both ECS mappings and structured non-ECS compact-context retention.

Examples:

  • src=203.125.116.98:47758:X1
    • removed once the value is fully represented by endpoint ECS fields plus interface name
  • src=192.168.0.19:60951:X0:STG-IT-PBU-01
    • removed once the value is fully represented by endpoint ECS fields, interface name, and structured trailing context retention
  • src=1.2.3.4:12345:X0:LABEL:EXTRA
    • retained because not all original trailing detail is represented yet

The same rule applies to structured proto values:

  • proto=udp/dns and proto=tcp/https
    • raw removed once fully represented by network.transport plus network.application
  • proto=6 or proto=udp/389
    • raw retained when canonical representation is still only partial

Sophos XGS Mapping

The normalizer now uses a generic semantic mapper as the primary path. That generic layer handles reusable mappings such as timestamp, severity, action aliases, IPs, ports, MACs, transport, URL and HTTP fields, ICMP fields, and device-context-based observer.* promotion.

Sophos XGS is now a thin extension layer on top of that generic path. It keeps only the product-specific deltas:

  • fw_rule_*
  • interface and zone keys
  • NAT alias keys
  • family categorization defaults
  • Invalid Traffic -> error.message

The generic strategy is documented in docs/architecture/generic-semantic-mapping.md. The Sophos-specific delta, mapping matrix, delete-on-success behavior, and live victoriaflow verification queries are in docs/architecture/sophos-xgs-ecs-mapping.md.

The currently validated narrow canonical extension scope includes Sophos XG srczonetype and dstzonetype, promoted only into:

  • syslog_analyzer.canonical_ext.observer.ingress.zone.type
  • syslog_analyzer.canonical_ext.observer.egress.zone.type

This remains vendor-scoped, deterministic, and cleanup-safe:

  • promotion only when raw evidence is present and non-empty
  • raw removed only on full equivalence
  • raw retained on partial, empty, or conflicting cases

Reference corpus

The repo now includes a project-owned reference fixture corpus in testdata/fixtures/reference/elastic_filebeat_corpus.yaml. It is informed by publicly available Elastic Filebeat module examples used as validation/reference material only, especially:

  • Sophos XG firewall samples and expected mappings
  • system/syslog sample lines and expected mappings

This corpus is used for semantic intent and golden coverage only. It does not turn this project into a Filebeat clone and it does not introduce static module-specific parser logic.

The offline corpus layer now also supports real Elastic integrations package fixtures as review-only evidence for:

  • semantic family learning
  • canonical extension candidate discovery
  • ECS candidate suggestion generation
  • delta analysis against local proxy-only corpus runs

Elastic expected JSON is treated as supporting evidence only. It is not copied mechanically into runtime behavior, does not imply source-code reuse, and is never sufficient on its own to justify destructive cleanup.

Enrichment precedence

  • Observed values remain the highest-trust evidence by default.
  • Static inventory may fill canonical identity fields when sender identity is missing or incomplete.
  • DNS, GeoIP/ASN, and reputation are additive enrichment layers.
  • Enrichment never silently replaces observed canonical identity.
  • Final ECS fields may expose a selected value, but internal provenance keeps whether it was observed or enriched.

VictoriaLogs stream fields

VictoriaLogs stream labels are now config-driven and template-based under outputs.victorialogs.stream_fields. The field list accepts either the legacy CSV string or a YAML list.

Supported syntax:

  • direct: stream_host
  • alias: device={observer.serial_number}
  • static: job=integrations/syslog-ecs-analyzer
  • templated static: stream=sophos-{vendor.payload.log_type}

Missing field behavior is controlled by:

  • outputs.victorialogs.missing_field_action: skip or fallback
  • outputs.victorialogs.missing_field_fallback: fallback value used when missing_field_action=fallback

Sophos XGS-oriented example:

outputs:
  victorialogs:
    enabled: true
    url: http://127.0.0.1:9428/insert/jsonline
    stream_fields:
      - job=integrations/syslog-ecs-analyzer
      - stream_host
      - device={observer.serial_number}
      - product={observer.product}
      - log_type={vendor.payload.log_type}
      - network_scope={syslog_analyzer.network.scope}
    time_field: _time
    missing_field_action: skip

Notes:

  • Nested source paths are resolved safely at runtime.
  • Missing fields never panic the sink.
  • host={host.name} is handled safely by emitting host.name as the effective stream field.
  • Other aliases or static names that would collide with structured ECS object roots are rejected at sink startup. Prefer non-conflicting aliases such as stream_host.
  • Stream template expansion is isolated to the VictoriaLogs sink and does not alter ECS normalization or enrichment decisions.

DNS runtime

DNS enrichment now uses a bounded worker pool with background reverse lookups. Operational behavior:

  • cache hits enrich immediately
  • cache misses enqueue a lookup and do not block the pipeline
  • positive and negative results are cached separately
  • observed or inventory-backed host identity is never overwritten by PTR results
  • source/destination scope classification is used before deciding lookup eligibility

Config surface:

  • enrichment.dns.hosts_path
  • enrichment.dns.timeout_ms
  • enrichment.dns.workers
  • enrichment.dns.queue_size
  • enrichment.dns.cache_ttl_ms
  • enrichment.dns.negative_ttl_ms
  • enrichment.dns.cache_size
  • enrichment.dns.resolve_private
  • enrichment.dns.resolve_public
  • enrichment.dns.servers

Emitted fields:

  • source.dns.ptr_name
  • destination.dns.ptr_name
  • compatibility retention in source.domain / destination.domain
  • internal host PTR evidence in syslog_analyzer.enrich.host.ptr_name

Tuning guidance:

  • keep resolve_private=false unless private PTR data is operationally useful
  • increase workers only when DNS latency is materially higher than event rate
  • increase negative_ttl_ms to suppress repeated NXDOMAIN/timeouts under noisy traffic
  • use hosts_path for deterministic local overrides before enabling live reverse lookups

Reputation runtime

Reputation enrichment remains additive only. It never rewrites event.action, event.outcome, severity, or core ECS identity fields.

Config surface:

  • enrichment.reputation.ip_path
  • enrichment.reputation.cache_ttl_ms
  • enrichment.reputation.negative_ttl_ms
  • enrichment.reputation.cache_size
  • enrichment.reputation.reload_interval_ms
  • enrichment.reputation.provider_name

Behavior:

  • exact IP matches are cached for repeated lookups
  • misses are negative-cached to avoid repeated table scans
  • provider/feed context is preserved in threat.enrichments
  • optional reload checks can refresh the table without process restart
  • public/private scope classification is used before deciding lookup eligibility

Sophos XGS example:

  • src_ip or dst_ip that matches the reputation table produces additive threat.enrichments[*].indicator.* metadata while leaving the original firewall action unchanged

IP scope classification

The analyzer now derives a shared IP scope classification for:

  • source.ip
  • destination.ip
  • source.nat.ip
  • destination.nat.ip

Single-IP scope values:

  • private
  • public
  • unknown

Overall event scope values:

  • private
  • public
  • mixed
  • unknown

Classification rules:

  • invalid or missing IP: unknown
  • RFC1918 private IPv4, IPv6 ULA, loopback, and link-local addresses: private
  • valid non-private, non-unspecified, non-multicast addresses: public
  • unspecified and multicast addresses: unknown
  • event scope uses source.ip and destination.ip
    • both private: private
    • both public: public
    • one private and one public: mixed
    • if either side is missing or unknown: unknown

Internal fields exposed in normalized output:

  • syslog_analyzer.source.scope
  • syslog_analyzer.destination.scope
  • syslog_analyzer.source.nat.scope
  • syslog_analyzer.destination.nat.scope
  • syslog_analyzer.network.scope

Eligibility use:

  • GeoIP/ASN: public only
  • DNS: controlled by enrichment.dns.resolve_private and enrichment.dns.resolve_public
  • Reputation: public only

The classification is additive metadata only. It does not overwrite ECS network.* or any observed IP field. Further scope details are in docs/architecture/phase6-ip-scope.md.

Kafka downstream scoring output

The Kafka sink follows a queue-backed, production-oriented pattern aligned with flowcollector-go, but adapted to syslog events. Kafka is the intended production handoff boundary to the external anomaly-scorer. This repository prepares a stable normalized contract; it does not perform production anomaly detection itself.

Key strategy

Kafka keys use a deterministic syslog grouping shape:

  • identity|family|discriminator
  • identity: observer.serial_number, then host.name, then observer.name, then log.syslog.hostname, then source.ip
  • family: vendor.payload.log_type + vendor.payload.log_component when present, then event.category, then process.name, log.syslog.appname, observer.product, observer.vendor
  • discriminator: event.code, then vendor.payload.log_subtype or event.action, then log.syslog.msgid, then generic

This keeps partitioning stable per device and log family while avoiding the high-cardinality fallback of hashing every message body. Same key means same Kafka partition because the sink uses Kafka hash partitioning.

Downstream contract intent

The stable downstream contract is centered on normalized semantic fields and compact quality signals, especially:

  • syslog_analyzer.semantic.family
  • event.category
  • event.type
  • event.action
  • event.outcome
  • event.code
  • event.severity
  • network.transport
  • source/destination presence and scope
  • observer.name and host.name when available
  • explainability summaries such as explain counts or top skip reason
  • confidence summaries such as family_confidence and mapping_confidence

outputs.kafka.ml_mode remains a compatibility switch for downstream consumers that want explicit producer-side metadata such as ml.mode and ml.key. It does not turn this process into the primary anomaly engine. When the normalized event already contains enough evidence, ml_mode also emits the scorer-facing canonical ML fields expected by flowcollector-ml, including:

  • flow.client.ip.addr
  • flow.server.ip.addr
  • flow.client.l4.port.id
  • flow.server.l4.port.id
  • l4.proto.name

flow.bytes and flow.packets remain intentionally absent for syslog-derived events unless the analyzer owns a safe total-counter semantic.

Producer and routing config

Kafka config surface:

  • outputs.kafka.brokers
  • outputs.kafka.topic
  • outputs.kafka.client_id
  • outputs.kafka.acks
  • outputs.kafka.compression
  • outputs.kafka.linger_ms
  • outputs.kafka.batch_size
  • outputs.kafka.retries
  • outputs.kafka.retry_backoff_ms
  • outputs.kafka.queue_max_messages
  • outputs.kafka.queue_block_ms
  • outputs.kafka.key_mode
  • outputs.kafka.ml_mode

Minimal routing controls:

  • outputs.kafka.route.enabled_path
  • outputs.kafka.route.log_types
  • outputs.kafka.route.event_categories
  • outputs.kafka.route.event_codes

Example:

outputs:
  kafka:
    enabled: true
    brokers:
      - kafka:9092
    topic: normalized.ml.events
    client_id: syslog-ecs-analyzer
    acks: all
    compression: zstd
    linger_ms: 250
    batch_size: 262144
    retries: 5
    queue_max_messages: 8192
    queue_block_ms: 0
    key_mode: syslog_identity
    ml_mode: true
    route:
      log_types: [Firewall, Content Filtering]
      event_codes: ["010101600001", "050901616001"]

Runtime behavior

  • Kafka delivery is isolated behind an internal queue and background worker.
  • Write enqueues events quickly so Kafka backpressure does not stall the rest of the output path.
  • Queue-full conditions are counted as Kafka drops instead of breaking VictoriaLogs delivery.
  • Delivery retries use bounded exponential backoff.
  • /status now includes Kafka queue depth and delivery/error counters when Kafka output is enabled.

Observability

Kafka status counters include:

  • sent events
  • failed events
  • dropped events
  • skipped events
  • retry attempts
  • encode errors
  • queue depth
  • queue capacity

Limitations

  • Topic selection is currently one configured topic per sink instance.
  • Per-family or per-device topics are intentionally not implemented yet.
  • Background delivery failures are surfaced through Kafka sink metrics rather than by failing the primary VictoriaLogs path.
  • Stateful anomaly scoring, temporal baselines, and alerting remain downstream concerns.

Offline corpus evaluation

The offline corpus runner and its feature/baseline/scoring modes are evaluation tooling. They are kept in-repo to:

  • validate normalization quality
  • test whether exported semantic features are useful downstream
  • provide regression evidence across corpora
  • prototype score-feature usefulness before handing ideas to the external scorer

These offline tools do not redefine the production architecture. Production anomaly scoring remains downstream via Kafka.

Victoriaflow

The analyzer is integrated as an isolated test component via:

  • /daten/victoriaflow/docker/docker-compose.syslog-ecs-analyzer-phase1.yml
  • /daten/victoriaflow/config/syslog-ecs-analyzer/config.yaml
  • /daten/victoriaflow/config/syslog-ecs-analyzer/inventory.yaml
  • /daten/victoriaflow/config/syslog-ecs-analyzer/replay.log

Primary proof boundary:

  • VictoriaLogs through the existing obswrapper query path on http://127.0.0.1:3100

Secondary structured proof boundary:

  • /daten/victoriaflow/volumes/syslog-ecs-analyzer/events.ndjson

The exact build, start, verify, stop, and rollback commands are in docs/testing/victoriaflow-test-integration.md. The full-image variant is documented in docs/testing/victoriaflow-full-image.md.

Full image defaults

One full-featured image is built from Dockerfile. The features compiled into that image are:

  • syslog input
  • VictoriaLogs output
  • Kafka output
  • static inventory enrichment
  • IP scope classification
  • GeoIP/ASN enrichment
  • DNS enrichment
  • IP reputation enrichment
  • dynamic VictoriaLogs stream fields
  • health and status endpoints

Recommended default posture:

  • enable VictoriaLogs and health/status
  • keep stdout and file output disabled unless a test explicitly needs them
  • keep Kafka, DNS, and reputation disabled until their backing resources and validation path are ready
  • enable GeoIP/ASN only when MaxMind databases are mounted

Safe output defaults:

  • outputs.victorialogs.enabled=true
  • outputs.kafka.enabled=false
  • outputs.stdout.enabled=false
  • outputs.file.enabled=false

If stdout or file are enabled explicitly, the analyzer logs a startup warning because those sinks can create high-volume docker-log or disk usage.

The image/config alignment with flowcollector-go includes:

  • CSV-or-list parsing for Kafka brokers and DNS servers
  • compatibility aliases such as bootstrap_servers, required_acks, cache_ttl_seconds, cache_max_entries, and reload_seconds
  • bounded worker/caching patterns for DNS and reputation

Deliberate deviations are documented in docs/architecture/phase6-full-image.md.

Design intent

  • keep stage boundaries explicit
  • keep future parser/decoder/resolver/normalizer implementations replaceable behind interfaces
  • keep enrichment modular and provenance-aware
  • treat flowcollector-go as a reference for enrichment and output patterns only where it fits syslog semantics

Further design details for the runtime extensions are in docs/architecture/phase6.md. The full-image packaging notes are in docs/architecture/phase6-full-image.md.