Tutorial: Backtest a Candidate Ruleset

Compare a current ruleset against a candidate over historical data to see what would change before you ship it.

Goal: before deploying a rule change, replay historical records through both the current and a candidate ruleset and compare the outcomes — how many decisions change, what becomes newly flagged, and what stops firing.

📘

Why backtest

Rule changes are easy to ship and hard to reason about at scale. A backtest answers "what would this change have done to last month's traffic?" before it touches live decisions.

Prerequisites

  • Historical records in Parquet or Arrow (or NDJSON).
  • Two rule files: the current rules_current.yaml and a rules_candidate.yaml.

The workflow

  1. Take a representative slice of historical data, in chronological order (window state depends on event order).
  2. Evaluate it under the current ruleset.
  3. Evaluate the same data under the candidate ruleset.
  4. Compare: decisions that changed, newly-matched records (new positives), records that stopped matching (lost positives), and overall agreement.

C++ backtest API

The C++ core exposes a first-class backtest. backtest(...) takes a BacktestConfig and returns a BacktestReport.

#include <blazerules/engine.h>

RuleEngine engine;

BacktestConfig config;
config.parquet_paths = {"history/day-1.parquet", "history/day-2.parquet"};
config.rules_file_a = "rules_current.yaml";
config.rules_file_b = "rules_candidate.yaml";
config.label_column = "fraud_label";
config.batch_size = 500000;

BacktestReport report = engine.backtest(config);

From Python

Python exposes the same comparison as RuleEngine.backtest(...).

import blazerules

engine = blazerules.RuleEngine()

report = engine.backtest(
    parquet_path=["history/day-1.parquet", "history/day-2.parquet"],
    rules_a="rules_current.yaml",
    rules_b="rules_candidate.yaml",
    label_column="fraud_label",
)

print(report.total_records)
print(report.fire_rate_a, report.fire_rate_b)
print(report.new_positives, report.lost_positives)
print(report.agreement_rate)

parquet_path may be one path or a list of paths. label_column is optional; when supplied, precision/recall fields are populated for both rule sets.

Validation

  • Feed identical, ordered batches to both engines — window operators are stateful and order-sensitive.
  • Sanity check: with rules_candidate.yaml == rules_current.yaml, agreement should be 100% and changed-decisions 0.

Where to go next