Publication Boundary
Design notes describe direction unless they explicitly say a behavior is implemented; limitations remain the implementation boundary.
Reference public document
Target architecture and rationale for the Lightmetrics system.
Design notes describe direction unless they explicitly say a behavior is implemented; limitations remain the implementation boundary.
Cross-check this page against the current limitations page before relying on a behavior operationally.
Build a small owned telemetry path for hosts where a full collector stack is too heavy or too generic.
INSERT ... FORMAT CapnProto
stream/schema rather than the lightmetrics framed wire bodyThe agent must send:
The server must accept public HTTPS ingest from known hosts, validate identity, deduplicate at-least-once batches, make incoming data queryable immediately, and persist immutable objects to object storage.
flowchart LR
agent[agent] --> http[HTTPS ingest]
http --> spool[disk ingest spool]
http --> landing[disk landing buffer]
spool --> object[object storage writer]
object --> raw[raw objects]
raw --> rollup[rollup worker]
rollup --> rolled[rollup objects]
landing --> query[query/UI API]
rolled --> query
Agent:
/proc and filesystem readsServer:
Use Cap’n Proto messages as the framed payload contents. The schema mirrors OTLP concepts without copying OTLP’s nested structure.
Important design choices:
le buckets inside each sample plus
explicit temporality for the seriesAgent to collector:
POST /ingest/v1/batch
Content-Type: application/x-capnp
Authorization: Bearer <per-host-token>
X-Lightmetrics-Agent: <agent-id>
X-Lightmetrics-Seq-Start: <seq>
X-Lightmetrics-Seq-End: <seq>
HTTP Content-Encoding is not used in v1. Compression is represented by frame flags
so the same bytes can be reused in the agent queue, HTTP body, server spool, and raw
object storage.
HTTP request body is a framed payload, not a bare Cap’n Proto message:
magic: 8 bytes
frame_version: u16 little-endian
flags: u16 little-endian
payload_len: u64 little-endian
payload_crc32: u32 little-endian
payload: packed Cap'n Proto batch, optionally compressed
Initial flags:
0x0001: payload is packed Cap’n Proto0x0002: payload is zstd-compressedUnknown frame flags are rejected until the protocol defines an explicit optional vs. required flag split.
The framed body is the lightmetrics wire format, not ClickHouse input. The schema
should still allow a ClickHouse-specific sender/exporter to write the same logical
metrics/logs as an unframed Cap’n Proto stream through ClickHouse HTTP
INSERT ... FORMAT CapnProto. That requires a ClickHouse-facing schema/table layout
and may require a self-managed ClickHouse target if a given ClickHouse Cloud service
does not support Cap’n Proto input. A ClickHouse-backed server mode can be added
after MVP if the custom object-store path is not competitive enough for real
ingest/query workloads.
Successful ingest response:
{"status":"ok","agent_id":"vpn","boot_id":"...","seq_start":10,"seq_end":20,"duplicate":false}
Duplicate batches return the same successful shape with duplicate=true, so agents
can safely delete already-accepted queue entries. Invalid frames return 400.
Authentication failures return 401 or 403. Full spool or intentional backpressure
returns 503 and must not acknowledge the batch.
Transport must stay HTTP/3-ready even if the MVP listener starts with HTTP/1.1 or
HTTP/2. Keep ingest semantics independent of connection ordering, TCP behavior, and
client source ports so the same /ingest/v1/batch contract can run over QUIC.
The first runtime listener is TLS 1.3 over HTTP/1.1 with rustls; HTTP/3/QUIC
should be added as a second listener adapter over the same ingest service.
TLS 1.3 0-RTT may be enabled only for the public ingest endpoint, and only because batch upload is replay-safe by design:
(agent_id, boot_id, seq_start, seq_end)duplicate=true without creating another ingest side
effectIf a deployment cannot maintain that dedupe horizon across restarts, disable 0-RTT for that deployment while still allowing HTTP/3 without early data.
The server must enforce public ingest limits before and during decode:
Requests beyond these limits return 413 or 400 before allocation-heavy decoding
or live-state updates.
Use TOML as the runtime configuration format.
Rationale:
Runtime files:
/etc/lightmetrics/agent.toml
/etc/lightmetrics/server.toml
Secrets stay outside config. The config points at token/certificate files by path.
Parser choice:
CUE can still be used later for fleet-level generation or validation, but it should not be required on hosts and should not be a runtime dependency.
Server config must define separate listeners:
/ingest/v1/batch/api/v1/*The ingest listener must not route query, UI, or admin paths. The query listener binds to localhost or a private/Tailscale address.
Object storage is the system of record. The server should write immutable compressed objects and small manifests instead of mutating large files.
Initial layout:
raw/v1/date=YYYY-MM-DD/hour=HH/agent=<agent-id>/<boot-id>-<seq-start>-<seq-end>.lmbatch.zst
rollup/v1/window=1m/date=YYYY-MM-DD/hour=HH/metric=<metric-name>/<chunk-id>.lmrollup.zst
rollup/v1/window=5m/date=YYYY-MM-DD/metric=<metric-name>/<chunk-id>.lmrollup.zst
index/v1/date=YYYY-MM-DD/hour=HH/<chunk-id>.lmindex.zst
manifests/v1/date=YYYY-MM-DD/hour=HH/<chunk-id>.json
The raw object body can remain the same Cap’n Proto batch frames used on the wire. Rollup and index objects can use dedicated Cap’n Proto schemas once query patterns are clearer.
Incoming data is available to private realtime views after durable local acceptance:
The landing buffer is the pre-object-store durability boundary. It may be the ingest spool itself or a separate disk buffer if that better separates raw acceptance from query/read optimization. Memory is a cache and subscription fanout aid, not the authoritative live query source.
Realtime metric views should update charts when new accepted metric batches arrive, without relying only on periodic dashboard refresh. The server should expose this as bounded private UI event streams over Server-Sent Events. Event payloads identify changed hosts, metric series, rollup windows, and query ranges; clients then fetch bounded deltas from the same query APIs used for normal views. Do not make an unbounded metrics streaming protocol part of MVP query correctness.
Realtime log tail uses the same model: the SSE stream announces newly accepted log records or tail cursors after durable local acceptance, while bounded log fetches read from the landing buffer and object storage.
Query correctness comes from merging:
If object storage is unavailable, queries may succeed only when the requested range
is wholly covered by the local landing-buffer horizon. Otherwise return a bounded
partial/error response with explicit partial=true or Prometheus-style error
metadata, depending on the API.
Bounded behavior:
agent.queue_dropped
alert record when it can.503 without acknowledgement.503 backpressure.Rollups are a server responsibility, not an agent responsibility.
First metric rollups:
Metric temporality:
instantcumulative with startUnixNs for reset detectiondelta when possibleHistogram aggregation requires identical bucket boundaries for a given series after
temporality conversion. The server should reject or quarantine histogram samples that
change bucket layout within the same series identity. For cumulative histograms, the
server detects resets when startUnixNs changes or bucket/count values decrease, and
does not merge across reset boundaries.
Approximate percentiles:
Rollup windows:
Logs and alerts should get compact time indexes first. Do not build full-text search until the access pattern proves it is needed.
Dashboard tabs and dashboard panels should be defined by configuration, not hard-coded as separate dashboard-specific UI surfaces. The private UI’s top-level navigation is the dashboard surface: built-in dashboard definitions provide default tabs, and site-specific dashboard definitions can add, hide, reorder, or override those tabs. Built-in operations tabs should also be represented as dashboard definitions that compose fixed private-UI block types. The implementation of those block types, backend APIs, query execution, validation, and route handlers remains ordinary code.
Built-in dashboards and site-specific dashboards use the same definition schema.
Definition loading:
/etc/lightmetrics/dashboards.d/*.tomlDefinition language:
collector_strip, segmented_mode, kpi_strip, table, timeseries,
log_table, alert_table, host_identity, recent_uploads, and
query_debug_shellDefinition fields should cover:
Expressions inside dashboard definitions should be a small typed subset for field
references, arithmetic, comparisons, boolean operations, duration/unit literals, and
named aggregations such as count, count_where, sum, mean, median, max,
p80, p95, and p99. Anything more complex belongs in server-side code or a
new registered transform.
Top-level built-in tabs should map to the definition model:
Dashboard definitions are the source of truth for the private UI. Grafana dashboard export/provisioning is not a correctness dependency for the MVP; the required Grafana integration is PromQL-compatible query serving from the Lightmetrics server. A low-priority read-only export surface may generate Grafana JSON for metric panels that map to stock Grafana Prometheus panels, but logs, alerts, and custom console-only blocks must be omitted or marked unsupported. Dashboards edited directly in Grafana are outside Lightmetrics configuration in MVP; there is no Grafana-to-Lightmetrics import path in the first version.
MVP dashboard management is file-based. A UI editor for dashboards is post-MVP. Do not accept arbitrary Grafana JSON as a first-class input format. Any Grafana-to-Lightmetrics converter or reuse of Grafana dashboard/frontend components is post-MVP and lower priority than PromQL compatibility.
Expose a Prometheus HTTP API compatibility target for metrics. Grafana’s built-in Prometheus data source works with systems that implement the Prometheus query API, so this is the required Grafana integration boundary for MVP.
Initial endpoints:
GET /api/v1/query
GET /api/v1/query_range
GET /api/v1/labels
GET /api/v1/label/<name>/values
GET /api/v1/series
GET /api/v1/metadata
Query language should start as a constrained PromQL/MetricsQL subset:
metric_name and
metric_name{label="value",label!="value"}metric_name[5m]rate() for counters over range selectorssum, avg, min, max, and count over one expressionby (...) grouping for aggregations_bucket, _sum, and _count
virtual serieshistogram_quantile() over bucket rollupsReject unsupported PromQL explicitly with Prometheus-style bad_data responses.
Do not silently reinterpret:
offsetand, or, unlessFor query_range, object storage is the canonical source and the local
spool/landing buffer overlays the not-yet-landed tail. The query engine must merge
by batch identity and series timestamp and dedupe repeated at-least-once uploads. If
object storage is unavailable, return a Prometheus-style error unless the requested
range is wholly inside the local landing-buffer horizon.
The P0 direct-selector implementation evaluates query_range at
start + n * step timestamps with a fixed 5 minute lookback, matching the
Prometheus graph-panel shape before broader PromQL execution is available. When
only the local accepted spool is configured, successful range responses include a
Prometheus warnings entry that marks the result as local-spool-only partial
coverage. Full object-store horizon checks remain part of the later object-store
query merge work.
Compatibility target:
status, data, errorType, error,
warnings[unix_seconds_float, "value"]query_rangequeryquery, query_range,
and histogram_quantile()Do not build a custom Grafana data source plugin for v1. A plugin creates a new maintenance surface, while Prometheus-compatible endpoints can be used by stock Grafana. Grafana dashboard export/provisioning is post-MVP and must not distract from PromQL compatibility.
Logs and alerts do not need to be Grafana-native in v1. Keep the server’s own JSON/Arrow API for those. If Grafana log browsing becomes important, add a Loki-compatible query surface later rather than inventing a plugin first.
Arrow/Perspective UI is post-MVP. Keep it behind a build feature such as
ui-perspective so the default server binary does not include Arrow IPC or frontend
assets. The MVP private UI path is the built-in console plus config-defined
dashboard definitions, backed by the Prometheus-compatible metrics API and bounded
JSON endpoints. Grafana support in MVP means stock Grafana can query Lightmetrics
through the Prometheus-compatible API.
MVP logs and alerts use bounded JSON APIs.
GET /api/v1/logs
GET /api/v1/logs/tail
GET /api/v1/alerts
Log query filters:
fromtohostseveritytargetcontainslimitcursororderAlert query filters:
fromtohostnamestatelimitcursorAlerts are ingested alert state records in MVP. The server does not evaluate alert rules, send notifications, silence alerts, or expose an Alertmanager-compatible API in the first version.
Use Server-Sent Events for live tailing in MVP. WebSocket and Arrow IPC are optional post-MVP paths.
The current log-tail SSE endpoint is GET /api/v1/logs/tail on the private query
listener. It requires the query bearer token and emits bounded JSON
log_tail_snapshot, log_tail_update, and log_tail_gap events. Event records
are loaded from the same accepted-log storage visibility source as
GET /api/v1/logs; clients use the returned opaque after cursor to repair a
reconnect or lag gap with another bounded tail request scoped to the matching
agent_id and boot_id. The endpoint is not a durable event journal and does
not provide text search.
With ui-perspective enabled, add bounded Arrow IPC endpoints for table-oriented UI
loads:
GET /api/v1/metrics.arrow
GET /api/v1/logs.arrow
GET /api/v1/alerts.arrow
Do not use Arrow for unbounded live tails. Keep live streams as SSE JSON notices and fetch bounded Arrow deltas when the Perspective UI needs table updates.
Delivery is at-least-once.
The agent writes encoded batches to a local append-only queue before upload. A batch is removed only after the collector returns success. If the agent retries after an ambiguous failure, the collector deduplicates by:
(agent_id, boot_id, seq_start, seq_end)
The public ingest endpoint must not expose query or admin APIs.
First version:
mTLS can replace bearer tokens later if certificate distribution is worth the added operational surface.
Agent dependencies should stay narrow:
Server can spend more footprint on:
The server still stays one process for the first version.