Production YAML Guide

This guide is the practical version of the rule reference. It shows how to write a YAML file that a production process can load directly.

The file layout

A complete YAML file can contain both rule semantics and local agent wiring:

Section	Used by	Purpose
`schema_version`	engine	Rule file compatibility version. Use `"2.1"`.
`fields`	engine	Optional type hints. Omit most fields and let BlazeRules infer them from the first batch.
`lookups`	engine	Named CSV lookup sets for `in_lookup` / `not_in_lookup`.
`decisions`	engine	Default decision and precedence.
`ruleset`	engine	Rule metadata and rules.
`instances`	`blazerules_agent` only	Local input/output instances for HTTP, file tail, or stdin.

instances is ignored by the in-process Python and C++ rule engine. The agent reads it to start multiple inputs that each load rules and write decisions.

Field hints

Field hints are optional. Use them when a field is important enough that you want stable typing before the first batch arrives.

fields:
  event_id: {type: string, nullable: false}
  card_token: {type: entity_key, nullable: false}
  amount: {type: float32, nullable: false}
  country_code:
    type: categorical
    values: [US, GB, IN, DE]
  event_ts_ms: {type: timestamp_ms, nullable: false}

Allowed field types:

float32, float64, int32, int64, categorical, entity_key, timestamp_ms, boolean, string.

Use values: only when you intentionally want a closed categorical set. Most fields should be inferred.

Lookups

Lookups are named files resolved relative to the YAML file. For S3-hosted YAML, use exact s3://bucket/key paths or relative keys under the same prefix.

lookups:
  blocked_merchants:
    type: string_set
    path: lookups/blocked_merchants.csv
  risky_bins:
    type: int_set
    path: lookups/risky_bins.csv
  vpn_ranges:
    type: ipv4_cidr_set
    path: lookups/vpn_ranges.csv

Supported lookup types:

Type	CSV column	Used with
`string_set`	`value`	`STRING`, `CATEGORICAL`, `ENTITY_KEY`
`int_set`	`value`	integer numeric fields
`ipv4_cidr_set`	`cidr`	IPv4 string fields

Decisions

Decision policy turns rule matches into one final per-row decision.

decisions:
  default: APPROVE
  precedence: [BLOCK, REVIEW, FLAG, APPROVE]
  risk_bands:
    LOW: [0, 29]
    MEDIUM: [30, 69]
    HIGH: [70, 1000]

Rules can set action, weight, priority, severity, and reason_code. Production routing should normally consume result.grouped_decision_indices() or compact decision logs instead of scanning Python strings one row at a time.

Condition examples by family

Numeric and range

- {field: amount, op: gt, value: 1000}
- {field: amount, op: between_including, value: [100, 5000]}
- {field: amount, op: gt_field, other_field: historical_avg_amount}

Categorical and null handling

- {field: country_code, op: in, values: [US, GB, IN]}
- {field: optional_note, op: is_empty}
- {field: description, op: is_not_empty}

Strings and regex

- {field: user_agent, op: contains, value: Mobile}
- {field: transaction_description, op: regex, value: "payment|checkout"}
- {field: user_agent, op: not_regex, value: "bot|crawler"}

Arrays, flags, and nested arrays of objects

- {field: tags, op: contains_any, values: [vip, trusted]}
- {field: signal_flags, op: flags_any, mask: 4}
- array_any:
    path: items
    where:
      and:
        - {field: price, op: gt, value: 100}
        - {field: category, op: eq, value: electronics}

Inside array_any, field names are scoped to the same array element. The example only matches when one item has both price > 100 and category = electronics.

Network, temporal, and geo

- {field: ip_address, op: ip_in_subnet, value: "10.0.0.0/8"}
- {field: event_ts_ms, op: within_last, value: 86400}
- {field: event_ts_ms, op: day_of_week_in, values: [1, 2, 3, 4, 5]}
- op: distance_gt
  lat_field: billing.lat
  lon_field: billing.lon
  other_lat_field: shipping.lat
  other_lon_field: shipping.lon
  value: 50

Lookups, windows, SQL, ML, and vectors

- {field: merchant_id, op: in_lookup, lookup: blocked_merchants}
- window:
    entity_field: card_token
    function: count
    duration: 10m
    op: gt
    value: 3
- sql: "amount > 1000 AND country_code IN ('US', 'GB')"
- model_score:
    model: fraud_logreg
    features: [amount, account_age_days, merchant_risk_score]
    op: gt
    value: 0.8
- vector_distance:
    fields: [embedding_0, embedding_1, embedding_2, embedding_3]
    metric: cosine
    reference: [0.1, 0.2, 0.3, 0.4]
    op: gt
    value: 0.7

Complete minimal production YAML

This single file can be loaded by Python/C++ as rules, and by blazerules_agent as a multi-instance local runtime.

schema_version: "2.1"

fields:
  event_id: {type: string, nullable: false}
  card_token: {type: entity_key, nullable: false}
  amount: {type: float32, nullable: false}
  country_code:
    type: categorical
    values: [US, GB, IN, DE]
  device_type:
    type: categorical
    values: [ios, android, web, emulator]
  event_ts_ms: {type: timestamp_ms, nullable: false}
  ip_address: {type: string}

lookups:
  blocked_merchants:
    type: string_set
    path: lookups/blocked_merchants.csv
  risky_bins:
    type: int_set
    path: lookups/risky_bins.csv
  vpn_ranges:
    type: ipv4_cidr_set
    path: lookups/vpn_ranges.csv

decisions:
  default: APPROVE
  precedence: [BLOCK, REVIEW, FLAG, APPROVE]
  risk_bands:
    LOW: [0, 29]
    MEDIUM: [30, 69]
    HIGH: [70, 1000]

ruleset:
  name: Payments Production Rules
  version: "2026.06.23"
  rules:
    - id: high_amount_emulator
      action: BLOCK
      severity: HIGH
      priority: 100
      weight: 60
      reason_code: HIGH_AMOUNT_EMULATOR
      conditions:
        and:
          - {field: amount, op: gt, value: 1000}
          - {field: device_type, op: eq, value: emulator}

    - id: blocked_merchant_or_vpn
      action: REVIEW
      severity: MEDIUM
      priority: 80
      weight: 40
      reason_code: MERCHANT_OR_NETWORK_RISK
      conditions:
        or:
          - {field: merchant.id, op: in_lookup, lookup: blocked_merchants}
          - {field: ip_address, op: in_lookup, lookup: vpn_ranges}

    - id: expensive_electronics_item
      action: FLAG
      severity: MEDIUM
      weight: 25
      conditions:
        array_any:
          path: items
          where:
            and:
              - {field: price, op: gt, value: 100}
              - {field: category, op: eq, value: electronics}

    - id: card_velocity_10m
      action: REVIEW
      severity: HIGH
      weight: 50
      conditions:
        window:
          entity_field: card_token
          function: count
          duration: 10m
          op: gt
          value: 3

    - id: impossible_shipping_distance
      action: REVIEW
      severity: HIGH
      weight: 45
      conditions:
        op: distance_gt
        lat_field: billing.lat
        lon_field: billing.lon
        other_lat_field: shipping.lat
        other_lon_field: shipping.lon
        value: 500

instances:
  - name: payments-http
    rules: rules.yaml
    batch_size: 4096
    flush_ms: 50
    service: payments-api
    source: http-json
    input:
      type: http
      host: 127.0.0.1
      port: 9480
    output:
      type: ndjson
      path: decisions-payments.ndjson
    dedupe:
      enabled: true
      key_fields: [event_id]
      ttl_seconds: 86400

  - name: checkout-pod-tail
    rules: rules.yaml
    batch_size: 2048
    flush_ms: 250
    service: checkout
    source: pod-stdout
    input:
      type: file_tail
      path: /var/log/containers/checkout.log
    output:
      type: ndjson
      path: decisions-checkout.ndjson

  - name: replay-stdin
    rules: rules.yaml
    batch_size: 8192
    flush_ms: 1000
    service: replay
    source: stdin
    input:
      type: stdin
    output:
      type: stdout

Run the multi-instance file:

blazerules_agent --config rules.yaml

Python equivalent: load the rule semantics from the same file

import blazerules

engine = blazerules.RuleEngine()
engine.load_rules("rules.yaml")

payload = b'{"event_id":"e1","card_token":"c1","amount":1200,"device_type":"emulator","country_code":"US","event_ts_ms":1782150000000}\n'
result = engine.evaluate_ndjson(payload)
print(result.decisions)

Common production mistakes

Do not put credentials in YAML. Use environment variables, profiles, or platform secrets.
Do not use instances: as a replacement for Kafka partitioning. It is a local agent convenience.
Do not enable OutputDetail.BITMASKS for routing unless you need per-rule masks.
Do not rely on inferred types for critical fields that may drift between producers.
Keep lookup CSV paths stable. Missing lookup files fail rule activation.