Backtesting
Replay historical Parquet or Arrow batches through a candidate ruleset and compare it against the one you run today.
Because BlazeRules is the same engine live and offline, the rules you run in production can be replayed over historical data without a second implementation. Backtesting means pointing a candidate ruleset at recorded records, then comparing its decisions and scores against the current ruleset before you promote the change.
One engine for stream and backtestThere is no separate "backtest mode". You load a ruleset and call the same
evaluate_batch/evaluate_ndjsonmethods you use live. The only difference is the source of records: historical Parquet or Arrow instead of a live queue.
The idea
A typical change-safety workflow looks like this:
- Read historical records as Arrow
RecordBatchobjects. - Evaluate each batch through two engines — one loaded with the current production rules, one with your candidate rules.
- Compare the per-record decisions, scores, and match counts. Differences are exactly the records your change would have routed differently.
The result fields you compare are the standard ones documented in Observability: decisions, decision_codes, scores, risk_bands, winning_rule_ids, and match_counts. The grouped-index helpers make a diff cheap:
import blazerules
current = blazerules.RuleEngine()
current.load_rules("rules.yaml")
candidate = blazerules.RuleEngine()
candidate.load_rules("rules-candidate.yaml")
# For each historical batch:
cur = current.evaluate_batch(batch)
cand = candidate.evaluate_batch(batch)
# Records the candidate would route differently:
cur_groups = cur.grouped_decision_indices()
cand_groups = cand.grouped_decision_indices()
# Compare cur_groups vs cand_groups, or diff cur.decisions vs cand.decisions.Reading historical data
Use whatever already produces typed Arrow batches. In full builds, blazerules_io.read_record_batches(path, batch_size=...) reads files into batches; in custom lean builds, read Parquet with pyarrow and pass the batches in.
import blazerules
import blazerules_io
engine = blazerules.RuleEngine()
engine.load_rules("rules-candidate.yaml")
for batch in blazerules_io.read_record_batches("history.parquet", batch_size=16384):
result = engine.evaluate_batch(batch)
# accumulate result.decisions / result.scores for comparisonNative backtest API
For Parquet history, use the built-in A/B comparison API.
report = engine.backtest(
parquet_path=["history/day-1.parquet", "history/day-2.parquet"],
rules_a="rules.yaml",
rules_b="rules-candidate.yaml",
label_column="fraud_label",
)
print(report.total_records)
print(report.fire_rate_a, report.fire_rate_b)
print(report.new_positives, report.lost_positives)
print(report.agreement_rate)
print(report.precision_a, report.recall_a)
print(report.precision_b, report.recall_b)The C++ overloads are backtest(const BacktestConfig&) and backtest(parquet_paths, rules_a, rules_b, label_column). BacktestConfig contains parquet_paths, rules_file_a, rules_file_b, label_column, and batch_size.
Shadow rulesA rule can carry a
shadowfield, which lets you evaluate a rule's effect without letting it drive the final decision. This is a natural fit for backtesting a single new rule inside an otherwise-unchanged ruleset. Usematch_counts,winning_rule_ids, and grouped decision indices to see how often the shadow rule fired and which rows would have changed under a promoted candidate.
Replay window rules in chronological orderWindow rules read prior-batch history, inject derived window columns, evaluate the current batch, then commit that batch for future batches. A backtest of any window-based rule is only correct if you feed batches in chronological order and keep entity affinity, exactly as the data arrived live. Same-batch repeated entity rows do not see earlier rows from that same batch by default. See the engine's window semantics for the full ordering contract.