Observability

What you can see while and after evaluation: in-process counters on the Result, plus decision and dead-letter logs from the agent.

BlazeRules is an embedded library, so observability is in-process: every evaluate_* call returns a Result carrying timing and ingest counters, and the agent can write per-record decision logs and dead-letter logs to disk. You aggregate or scrape these from your own process — the core does not run a metrics server.

📘

Metrics live on the Result, not on a server

There is no daemon or /metrics endpoint in the core engine. The numbers you need per batch — timing, how many records were processed, how many were skipped, and why — come back on the Result object. Roll them up however your process already exports metrics.

Counters on every Result

These fields are returned by evaluate_ndjson, evaluate_batch, and the other evaluate methods.

FieldMeaning
n_recordsRecords in the batch the engine saw.
n_matchedRecords where at least one rule matched.
timing_msWall time the engine spent evaluating the batch, in milliseconds.
messages_processedCount of records successfully ingested and evaluated.
messages_skippedCount of records skipped during ingest (driven by ingest_error_mode).
error_countsPer-category counts of ingest/type errors encountered.
error_samplesA bounded set of example error records/messages for diagnosis.
result = engine.evaluate_ndjson(payload)

print("records:", result.n_records, "matched:", result.n_matched)
print("took ms:", result.timing_ms)
print("processed:", result.messages_processed, "skipped:", result.messages_skipped)

if result.messages_skipped:
    print("error counts:", result.error_counts)
    print("error samples:", result.error_samples)

error_counts and error_samples are the first place to look when messages_skipped is non-zero — they tell you whether records are being dropped for malformed JSON, type mismatches, or another reason. See Error Reference for what each category means and how ingest modes change the behavior.

Decision logs and dead-letter logs

When you run the agent, it writes per-record decisions to the output you configure. In rules.yaml, each instance has an output: block:

instances:
  - name: payments-http
    rules: rules.yaml
    output:
      type: ndjson
      path: decisions-payments.ndjson
  - name: checkout-log-tail
    rules: rules.yaml
    output:
      type: stdout

An ndjson output writes one decision record per line — a durable decision log you can tail, archive, or feed to your own pipeline. The dashboard reads such logs back to render a local read-only view; it is launched with --decision-log and --dead-letter-log paths pointing at these files (see Deployment).

A dead-letter log captures records that could not be ingested when ingest_error_mode = IngestErrorMode.SKIP_TO_DEAD_LETTER — instead of being counted and dropped, they are set aside for inspection. Pair the dead-letter log with the messages_skipped / error_samples counters above to see both the count and the offending payloads.

In-process metrics

Enable the built-in collecting metrics sink when you want cumulative counters, gauges, and histograms across many batches.

engine.enable_metrics()

for payload in payloads:
    engine.evaluate_ndjson(payload)

snapshot = engine.metrics_snapshot()
print(snapshot["counters"])
print(snapshot["gauges"])
print(snapshot["histograms"])

engine.reset_metrics()

metrics_snapshot() returns:

KeyShape
counters{metric_name_or_labeled_key: int}
gauges{metric_name_or_labeled_key: float}
histograms{metric_name_or_labeled_key: {count, sum, min, max, mean}}

Built-in metric names include:

blazerules.records_evaluated_total
blazerules.batches_evaluated_total
blazerules.records_skipped_total
blazerules.records_matched_total
blazerules.batch_total_latency_us
blazerules.batch_evaluation_latency_us
blazerules.batch_transpose_latency_us
blazerules.rule_fired_total{rule_id=...}
blazerules.rule_fire_rate{rule_id=...}
blazerules.decisions_total{action=...}
blazerules.hot_reload_success_total
blazerules.hot_reload_failed_total

The core does not run a Prometheus HTTP server. If you need Prometheus, OpenTelemetry, or another external system, read metrics_snapshot() in your own process and export it through your existing instrumentation.

Where to go next