Tutorial: Backtest a Candidate Ruleset
Compare a current ruleset against a candidate over historical data to see what would change before you ship it.
Goal: before deploying a rule change, replay historical records through both the current and a candidate ruleset and compare the outcomes — how many decisions change, what becomes newly flagged, and what stops firing.
Why backtestRule changes are easy to ship and hard to reason about at scale. A backtest answers "what would this change have done to last month's traffic?" before it touches live decisions.
Prerequisites
- Historical records in Parquet or Arrow (or NDJSON).
- Two rule files: the current
rules_current.yamland arules_candidate.yaml.
The workflow
- Take a representative slice of historical data, in chronological order (window state depends on event order).
- Evaluate it under the current ruleset.
- Evaluate the same data under the candidate ruleset.
- Compare: decisions that changed, newly-matched records (new positives), records that stopped matching (lost positives), and overall agreement.
C++ backtest API
The C++ core exposes a first-class backtest. backtest(...) takes a BacktestConfig and returns a BacktestReport.
#include <blazerules/engine.h>
RuleEngine engine;
BacktestConfig config;
config.parquet_paths = {"history/day-1.parquet", "history/day-2.parquet"};
config.rules_file_a = "rules_current.yaml";
config.rules_file_b = "rules_candidate.yaml";
config.label_column = "fraud_label";
config.batch_size = 500000;
BacktestReport report = engine.backtest(config);From Python
Python exposes the same comparison as RuleEngine.backtest(...).
import blazerules
engine = blazerules.RuleEngine()
report = engine.backtest(
parquet_path=["history/day-1.parquet", "history/day-2.parquet"],
rules_a="rules_current.yaml",
rules_b="rules_candidate.yaml",
label_column="fraud_label",
)
print(report.total_records)
print(report.fire_rate_a, report.fire_rate_b)
print(report.new_positives, report.lost_positives)
print(report.agreement_rate)parquet_path may be one path or a list of paths. label_column is optional; when supplied, precision/recall fields are populated for both rule sets.
Validation
- Feed identical, ordered batches to both engines — window operators are stateful and order-sensitive.
- Sanity check: with
rules_candidate.yaml==rules_current.yaml, agreement should be 100% and changed-decisions 0.