C++ API

The Python module blazerules is a thin binding over a C++20 core. When you don't want Python anywhere near the hot path — you ship a single native binary, link into an existing C++ service, or need the lowest possible per-batch overhead — you can embed that core directly. This page is the map: which headers to include, the engine surface that matters, and how the C++ names line up with the Python ones.

📘
Same engine, two front doors
The C++ core and the Python module run the identical compiler, kernels, schema inference, and decision logic. Anything you read about evaluation semantics in Core Concepts or the Python API holds here — only the call syntax and a few enum names change.

Headers

All core headers live under include/blazerules/:

Header	What it gives you
`engine.h`	`RuleEngine`, `EngineConfig`, `BacktestConfig`, `RuleFileFormat`
`schema.h`	`ColumnType`, `FieldSpec`, `BlazeRulesSchema`
`batch_result.h`	`BatchResult`, `BacktestReport`, timing and per-rule structures
`conflict.h`	`ConflictReport` — returned by `load_rules`
`rule_spec.h`	Rule, action, and operator types shared across the API

engine.h includes <arrow/api.h>, so an Apache Arrow development install must be on your include path.

The shape of the engine

A RuleEngine compiles a YAML rule set once into immutable execution plans, then evaluates batches of records against it. Each call returns a BatchResult carrying decisions, scores, risk bands, the winning rule per record, and — when you ask for them — per-rule bitmasks. Schema inference works exactly as in Python: pass a schema up front, or let the first evaluated batch bind it.

#include "blazerules/engine.h"

EngineConfig cfg;
cfg.output_detail = EngineConfig::OUTPUT_DECISIONS;   // see enum note below

RuleEngine engine(cfg);                               // non-copyable; own it / pass by reference
ConflictReport report = engine.load_rules("rules.yaml");
// inspect `report` before serving — load is strict (see below)

std::string ndjson = /* one JSON object per line */;
BatchResult r = engine.evaluate_ndjson(ndjson);

for (size_t i = 0; i < r.decisions.size(); ++i) {
    // r.decisions[i], r.scores[i], r.winning_rule_ids[i] ...
}

Constructors

explicit RuleEngine(EngineConfig config = {}) — default or configured engine; schema inferred on first batch.
RuleEngine(std::vector<FieldSpec> fields, EngineConfig config = {}) — bind a schema up front.
static BlazeRulesResult<std::unique_ptr<RuleEngine>> RuleEngine::create(BlazeRulesSchema schema, EngineConfig config) — factory returning a result wrapper instead of throwing.

RuleEngine is non-copyable (the copy constructor and copy assignment are deleted). Move it or pass it by reference; never copy it into a container by value.

Loading rules

ConflictReport load_rules(const std::string& rules_path);
ConflictReport load_rules_from_string(const std::string& rules_yaml_or_json,
                                      RuleFileFormat format = RuleFileFormat::YAML); // enum class RuleFileFormat { YAML, JSON }
ConflictReport reload_rules_now(const std::string& rules_path);
ConflictReport analyze_conflicts(const std::string& rules_path);
std::string    active_rule_set_version() const;

🚧
load_rules returns a report, not void
Activation is strict: malformed YAML, duplicate rule IDs, or references to missing lookups fail before the new rule set goes live. load_rules hands back a ConflictReport describing overlaps and conflicts. Check it before you start serving decisions rather than discovering a bad rule set at evaluation time.

Evaluating batches

BatchResult evaluate_ndjson(std::string_view ndjson_bytes);
BatchResult evaluate_ndjson_padded(std::string_view ndjson_bytes);   // input already simdjson-padded
BatchResult evaluate_ndjson_file(const std::string& path);           // mmap, zero-copy replay
BatchResult evaluate_record_batch(const std::shared_ptr<arrow::RecordBatch>& batch);
BatchResult evaluate_batch(const std::shared_ptr<arrow::RecordBatch>& batch); // inline alias for evaluate_record_batch
BatchResult evaluate_messages(const std::vector<std::string>& messages);
BatchResult evaluate_message_views(const std::vector<std::string_view>& messages);

Prefer evaluate_batch when upstream data is already typed Arrow, and evaluate_ndjson for JSON streams. Each evaluator has an _into(..., BatchResult& out) variant that reuses an existing result object to avoid per-batch allocation in tight loops.

Advanced: sharding and partition affinity

For window-heavy streaming workloads you can keep entity affinity across shards: create_shards(int) returns owned per-shard engines, and the evaluate_partition(int partition_id, ...) overloads evaluate a single partition's records. Use these only when a profiler tells you a single engine's window state is the bottleneck.

Models, hot reload, schema, backtest

void register_model(const std::string& name, const std::string& path);  // ONNX
int  num_models() const;

void enable_hot_reload(const std::string& rules_file_path,
                       std::chrono::seconds poll_interval = std::chrono::seconds(5));
void stop_hot_reload();
HotReloadStatus hot_reload_status() const;

const BlazeRulesSchema& schema() const;
bool schema_bound() const;
SchemaState schema_state() const;   // enum class SchemaState { UNBOUND, INFERRED_BOUND, USER_BOUND }

BacktestReport backtest(const BacktestConfig& config);

🚧
ONNX is build-gated
model_score rules and register_model(...) require an ONNX-enabled build (BLAZERULES_ENABLE_ONNX, default ON). In a build with ONNX off, model_score rules are rejected at compile time and register_model throws. See Backtesting a Candidate for the backtest(...) workflow.

The methods above are the load → evaluate → read surface most embedders need. engine.h also exposes reset_window_state(), num_window_channels(), in-process metrics (enable_metrics(), reset_metrics(), metrics_counters(), metrics_gauges(), metrics_histograms()), stats(), and partition-affine evaluation helpers.

C++ vs Python, side by side

   import blazerules

   config = blazerules.EngineConfig()
   config.output_detail = blazerules.OutputDetail.DECISIONS

   engine = blazerules.RuleEngine(config)
   engine.load_rules("rules.yaml")

   result = engine.evaluate_ndjson(ndjson_bytes)
   for decision in result.decisions:
       ...

Enum names differ between the two front doors

The EngineConfig modes are nested unscoped enums inside the struct, so you reference them as EngineConfig::<NAME>. The constant names are not identical to the Python ones:

Setting	Python	C++
Output detail	`OutputDetail.DECISIONS`	`EngineConfig::OUTPUT_DECISIONS`
Output detail	`OutputDetail.BITMASKS`	`EngineConfig::OUTPUT_BITMASKS`
Ingest errors	`IngestErrorMode.SKIP_AND_COUNT`	`EngineConfig::SKIP_AND_COUNT`
Type mismatch	`TypeMismatchMode.NULL_ON_TYPE_ERROR`	`EngineConfig::NULL_ON_TYPE_ERROR`

📘
The C++ default is OUTPUT_BITMASKS
In C++, EngineConfig::output_detail defaults to OUTPUT_BITMASKS. If you only need decisions and scores, set it to EngineConfig::OUTPUT_DECISIONS explicitly — that is the cheaper path and matches the Python guidance.

Reading a `BatchResult` in C++

The data is the same as the Python result, but a few members that are methods in Python are plain fields in C++:

Data	Python	C++
Records / matches	`n_records`, `n_matched`	`n_records`, `n_matched` (fields)
Decisions	`decisions`, `scores`, `winning_rule_ids`	same field names
Grouped indices	`grouped_decision_indices()` (method)	`grouped_decision_indices` (member field)
Per-rule counts	`match_counts`	`rule_match_counts` (member field)
Matched rows	`matched_indices`	`matched_record_indices` (member field)
Timing	`timing_ms`	`timing_ms()` (method → `unordered_map<string,double>`)

BatchResult r = engine.evaluate_ndjson(ndjson);

int matched = r.n_matched;
const auto& approve_rows = r.grouped_decision_indices["approve"];   // member access
auto timings = r.timing_ms();                                       // method in C++ too
double total_ms = timings["total"];

Zero-copy buffer helpers (rule_bitmask_buffer(id), matched_indices_buffer(), decision_codes_buffer(), grouped_indices_buffer(decision)) expose the underlying memory as arrow::Buffer when you need to hand results to another Arrow consumer without copying.

Linking against the core

The core build/link target is blazerules_core. In your CMakeLists.txt:

# either find an installed package...
# find_package(blazerules CONFIG REQUIRED)
# ...or add the repo as a subdirectory
add_subdirectory(third_party/blazerules)

add_executable(app main.cpp)
target_link_libraries(app PRIVATE blazerules_core)

The install export namespace is blazerules::, so installed consumers can link blazerules::blazerules_core when using the generated CMake package. In-tree builds link the target as blazerules_core.

Next steps

Embed in C++

A runnable end-to-end C++ program: include, load, evaluate, read.

Streaming IO

The blazerules_io module — Kafka, CDC, decoders, file readers.

Python API

The Python binding over this same core.

C++ API

Same engine, two front doors

Headers

The shape of the engine

Constructors

Loading rules

`load_rules` returns a report, not `void`

Evaluating batches

Models, hot reload, schema, backtest

ONNX is build-gated

C++ vs Python, side by side

Enum names differ between the two front doors

The C++ default is `OUTPUT_BITMASKS`

Reading a `BatchResult` in C++

Linking against the core

Next steps

Same engine, two front doors

Headers

The shape of the engine

Constructors

Loading rules

load_rules returns a report, not void

Evaluating batches

Models, hot reload, schema, backtest

ONNX is build-gated

C++ vs Python, side by side

Enum names differ between the two front doors

The C++ default is OUTPUT_BITMASKS

Reading a BatchResult in C++

Linking against the core

Next steps

`load_rules` returns a report, not `void`

The C++ default is `OUTPUT_BITMASKS`

Reading a `BatchResult` in C++