Architecture

This page explains how BlazeRules evaluates a batch — the stages a batch of records passes through, the components that do the work, and how rules are compiled once and swapped safely at runtime.

The central idea is that BlazeRules borrows the decision semantics of a rule engine but runs them with the execution physics of a columnar engine: rules compile once into immutable plans, and each batch is evaluated as a vectorized columnar scan rather than one record at a time.

Components

The repository layout maps directly onto the runtime components:

Core (src/core/) — the engine, SIMD kernels, the JSON-to-columnar transposer, dictionaries, and the window store.
Compiler (src/compiler/) — the YAML/SQL parser, the plan compiler, validation, and conflict checks. This is the strict, off-hot-path component that turns rule text into an immutable plan.
Bindings (src/bindings/) — the pybind11 layer exposing the blazerules Python module.
IO (src/io/) — blazerules_io: Kafka, CDC, Arrow IPC, Avro, Protobuf, and s3:// file reads.
Dashboard / Agent (src/dashboard/, src/agent/) — a local read-only UI and a multi-input ingest process.

The evaluation pipeline

Every batch flows through the same eight stages. Stages that do nothing for a given workload cost nothing — for example, an Arrow input skips the JSON transpose, and the derived-column stage is a no-op when no rule references a window, a model, or a vector.

flowchart TD
  A["Input: JSON / NDJSON / Arrow"] --> B["Transpose to columnar (JSON to Arrow; Arrow passes through)"]
  B --> C["Dictionary-encode categorical and entity columns to int32"]
  C --> D["Window read: load prior committed state"]
  D --> E["Inject derived columns (windows, model_score, vector_distance)"]
  E --> F["Kernel bind: select SIMD kernels for the plan"]
  F --> G["Evaluate: morsel-parallel SIMD, shared-predicate reuse, per-rule bitmasks, fused decision reduction"]
  G --> H["Assemble result: decisions, scores, risk bands, winning rules"]
  H --> I["Window write: commit this batch for future batches"]

Input. Records arrive as JSON/NDJSON bytes or as an Apache Arrow record batch.
Transpose. JSON is transposed into a columnar layout. Arrow input is already columnar and passes through; rule-referenced columns are projected by name, so extra columns and different column orders are fine.
Dictionary encode. Categorical and entity-key columns are dictionary-encoded into compact int32 IDs so set and equality operators run as integer comparisons.
Window read. For window rules, prior committed state is read so this batch can see history.
Inject derived columns. Window aggregates, model_score (ONNX) outputs, and vector_distance results are computed once per batch and injected as derived columns. This is a single mechanism with zero cost when no rule uses it.
Kernel bind. The plan binds to the SIMD kernels selected for the running CPU.
Evaluate. The core scans columns with morsel-parallel SIMD kernels, reuses shared predicates across rules (a common scan runs once), builds per-rule bitmasks, and fuses the decision reduction.
Assemble + window write. Results are assembled (decisions, scores, risk bands, winning rule per record, match counts), then this batch's state is committed for future window reads.

📘
What "derived column" means here
Windows, ONNX model scores, and vector distances are not special evaluation paths. They are computed up front and injected as ordinary columns, after which the normal operators read them like any other field. One registry, computed once per batch, free when unused.

SIMD dispatch

BlazeRules selects its kernel family at runtime based on the CPU: ARM64 NEON, x86_64 AVX2/FMA, optional AVX-512, or a scalar fallback. On the Apple M1 reference machine the backend is neon. The vectorized operator families are numeric, range, set, null/empty, bitfield, and closed-enum array bitset; the remaining families (cross-field, string, regex, IP/CIDR, temporal, geo, lookup) are correct but scalar. See the Performance Model for the implications.

Window semantics

Window rules read state committed by earlier batches, inject derived window columns for the current batch, evaluate, then write the current batch's state for future batches. As a result, batch N sees state from batches before it; same-batch repeated entity rows do not see earlier rows from that same batch by default. Keep partition/entity affinity for window-heavy streaming workloads so each entity's history lands in the same stream.

Hot reload

Rule changes never block the evaluation path. When hot reload is enabled, a new rule file is compiled and validated off the hot path, and the active plan is swapped atomically only if compilation succeeds. A failed reload leaves the previous rule set running. Any batch keeps the rule set it observed when it started.

sequenceDiagram
  participant FS as Rule file
  participant C as Compiler (off hot path)
  participant E as Engine
  FS->>C: poll detects change
  C->>C: parse + validate + load lookups
  alt compile succeeds
    C->>E: atomic plan swap
  else compile fails
    C-->>E: keep previous plan active
  end

Where to go next

Core Concepts

The vocabulary: rules, conditions, decisions, scores, and risk bands.

Performance Model

Why the columnar pipeline is fast, and where the honest ceilings are.

Data Model and Schema

Column types, schema inference, and dictionary encoding.