Observability

BlazeRules is an embedded library, so observability is in-process: every evaluate_* call returns a Result carrying timing and ingest counters, and the agent can write per-record decision logs and dead-letter logs to disk. You aggregate or scrape these from your own process — the core does not run a metrics server.

📘
Metrics live on the Result, not on a server
There is no daemon or /metrics endpoint in the core engine. The numbers you need per batch — timing, how many records were processed, how many were skipped, and why — come back on the Result object. Roll them up however your process already exports metrics.

Counters on every Result

These fields are returned by evaluate_ndjson, evaluate_batch, and the other evaluate methods.

Field	Meaning
`n_records`	Records in the batch the engine saw.
`n_matched`	Records where at least one rule matched.
`timing_ms`	Wall time the engine spent evaluating the batch, in milliseconds.
`messages_processed`	Count of records successfully ingested and evaluated.
`messages_skipped`	Count of records skipped during ingest (driven by `ingest_error_mode`).
`error_counts`	Per-category counts of ingest/type errors encountered.
`error_samples`	A bounded set of example error records/messages for diagnosis.

result = engine.evaluate_ndjson(payload)

print("records:", result.n_records, "matched:", result.n_matched)
print("took ms:", result.timing_ms)
print("processed:", result.messages_processed, "skipped:", result.messages_skipped)

if result.messages_skipped:
    print("error counts:", result.error_counts)
    print("error samples:", result.error_samples)

error_counts and error_samples are the first place to look when messages_skipped is non-zero — they tell you whether records are being dropped for malformed JSON, type mismatches, or another reason. See Error Reference for what each category means and how ingest modes change the behavior.

Decision logs and dead-letter logs

When you run the agent, it writes per-record decisions to the output you configure. In rules.yaml, each instance has an output: block:

instances:
  - name: payments-http
    rules: rules.yaml
    output:
      type: ndjson
      path: decisions-payments.ndjson
  - name: checkout-log-tail
    rules: rules.yaml
    output:
      type: stdout

An ndjson output writes one decision record per line — a durable decision log you can tail, archive, or feed to your own pipeline. The dashboard reads such logs back to render a local read-only view; it is launched with --decision-log and --dead-letter-log paths pointing at these files (see Deployment).

A dead-letter log captures records that could not be ingested when ingest_error_mode = IngestErrorMode.SKIP_TO_DEAD_LETTER — instead of being counted and dropped, they are set aside for inspection. Pair the dead-letter log with the messages_skipped / error_samples counters above to see both the count and the offending payloads.

In-process metrics

Enable the built-in collecting metrics sink when you want cumulative counters, gauges, and histograms across many batches.

engine.enable_metrics()

for payload in payloads:
    engine.evaluate_ndjson(payload)

snapshot = engine.metrics_snapshot()
print(snapshot["counters"])
print(snapshot["gauges"])
print(snapshot["histograms"])

engine.reset_metrics()

metrics_snapshot() returns:

Key	Shape
`counters`	`{metric_name_or_labeled_key: int}`
`gauges`	`{metric_name_or_labeled_key: float}`
`histograms`	`{metric_name_or_labeled_key: {count, sum, min, max, mean}}`

Built-in metric names include:

blazerules.records_evaluated_total
blazerules.batches_evaluated_total
blazerules.records_skipped_total
blazerules.records_matched_total
blazerules.batch_total_latency_us
blazerules.batch_evaluation_latency_us
blazerules.batch_transpose_latency_us
blazerules.rule_fired_total{rule_id=...}
blazerules.rule_fire_rate{rule_id=...}
blazerules.decisions_total{action=...}
blazerules.hot_reload_success_total
blazerules.hot_reload_failed_total

The core does not run a Prometheus HTTP server. If you need Prometheus, OpenTelemetry, or another external system, read metrics_snapshot() in your own process and export it through your existing instrumentation.

Where to go next

Deployment

Run the agent and dashboard, and wire decision/dead-letter logs.

Error Reference

What the error counters mean and how to act on them.