Tutorial: Backtest a Candidate Ruleset

Goal: before deploying a rule change, replay historical records through both the current and a candidate ruleset and compare the outcomes — how many decisions change, what becomes newly flagged, and what stops firing.

📘
Why backtest
Rule changes are easy to ship and hard to reason about at scale. A backtest answers "what would this change have done to last month's traffic?" before it touches live decisions.

Prerequisites

Historical records in Parquet or Arrow (or NDJSON).
Two rule files: the current rules_current.yaml and a rules_candidate.yaml.

The workflow

Take a representative slice of historical data, in chronological order (window state depends on event order).
Evaluate it under the current ruleset.
Evaluate the same data under the candidate ruleset.
Compare: decisions that changed, newly-matched records (new positives), records that stopped matching (lost positives), and overall agreement.

C++ backtest API

The C++ core exposes a first-class backtest. backtest(...) takes a BacktestConfig and returns a BacktestReport.

#include <blazerules/engine.h>

RuleEngine engine;

BacktestConfig config;
config.parquet_paths = {"history/day-1.parquet", "history/day-2.parquet"};
config.rules_file_a = "rules_current.yaml";
config.rules_file_b = "rules_candidate.yaml";
config.label_column = "fraud_label";
config.batch_size = 500000;

BacktestReport report = engine.backtest(config);

From Python

Python exposes the same comparison as RuleEngine.backtest(...).

import blazerules

engine = blazerules.RuleEngine()

report = engine.backtest(
    parquet_path=["history/day-1.parquet", "history/day-2.parquet"],
    rules_a="rules_current.yaml",
    rules_b="rules_candidate.yaml",
    label_column="fraud_label",
)

print(report.total_records)
print(report.fire_rate_a, report.fire_rate_b)
print(report.new_positives, report.lost_positives)
print(report.agreement_rate)

parquet_path may be one path or a list of paths. label_column is optional; when supplied, precision/recall fields are populated for both rule sets.

Validation

Feed identical, ordered batches to both engines — window operators are stateful and order-sensitive.
Sanity check: with rules_candidate.yaml == rules_current.yaml, agreement should be 100% and changed-decisions 0.

Where to go next

Backtesting (Operations)

Operational guidance for running backtests over large history.

Hot Reload

Once validated, roll the candidate out safely with a guarded swap.