Data Model and Schema
BlazeRules column types, how schema is inferred from the first batch, optional field hints, nested and dotted fields, and dictionary encoding.
This page describes how BlazeRules sees your data: the column types it works with, how it infers a schema when you do not provide one, the optional fields: hints, and how nested and categorical data are handled.
BlazeRules is columnar: a batch is a set of typed columns, and every operator runs against a column. Whether the input is JSON or Arrow, the engine evaluates over the same typed columnar model.
Column types
BlazeRules works with the following column types. In a rule file's fields: block they are written in lowercase (float32, entity_key, and so on).
| Type | Use for |
|---|---|
float32 | Single-precision numeric values (amounts, rates, coordinates). |
float64 | Double-precision numeric values (e.g. latitude/longitude). |
int32 | 32-bit integers; also the encoding target for categoricals and entities. |
int64 | 64-bit integers (large IDs, BINs, bitfields). |
categorical | A closed or open set of string labels, dictionary-encoded internally. |
entity_key | An identity column (e.g. card token) used for windows and grouping. |
timestamp_ms | Epoch milliseconds for temporal operators and windows. |
string | Free text for string and regex operators. |
boolean | True/false values. |
In Python the enum members are uppercase: ColumnType.FLOAT32, ColumnType.ENTITY_KEY, ColumnType.TIMESTAMP_MS, ColumnType.BOOLEAN, and so on. The YAML spelling is lowercase.
Schema inference
You do not have to declare a schema. The lifecycle is:
- Unbound.
load_rules(...)parses the rules; no schema is bound yet. - Sample. The first evaluated batch samples the rule-referenced fields only.
- Infer + bind. Supported types are inferred from those samples and the schema is bound.
- Compile + activate. The plan compiles against the bound schema and the rules go live.
If you want explicit control instead of inference, construct the engine with a schema built from field definitions:
import blazerules
schema = [
blazerules.Field("card_token", blazerules.ColumnType.ENTITY_KEY, nullable=False),
blazerules.Field("amount", blazerules.ColumnType.FLOAT32, nullable=False),
blazerules.Field("device_type", blazerules.ColumnType.CATEGORICAL),
]
config = blazerules.EngineConfig()
engine = blazerules.RuleEngine(schema, config)Field(name, type, nullable=True) returns a FieldSpec. ENTITY_KEY fields automatically set is_entity_field.
Optional field hints
The top-level fields: block in a rule file is a set of hints, not a mandatory user schema. Hints are most useful for entity keys, timestamps, nullability, and closed categorical value sets. This is the fields: block from the canonical sample rule file:
fields:
card_token: {type: entity_key, nullable: false}
amount: {type: float32, nullable: false}
account_age_days: {type: int32}
hour_of_day: {type: float32}
country_code:
type: categorical
values: [US, GB, IN, DE, BR, CN, RU]
device_type:
type: categorical
values: [ios, android, web, emulator]
event_ts_ms: {type: timestamp_ms}
optional_note: {type: string, nullable: true}
merchant.risk.score: {type: float32}Note the patterns: an entity_key for grouping/windows, categorical fields with an explicit closed values: list, a nullable: true field, a timestamp_ms column, and a dotted name (merchant.risk.score) for a nested field.
Nested and dotted fields
Nested records are addressed with dotted field names. The same dotted path works for nested JSON objects and for Arrow struct fields:
{"merchant":{"risk":{"score":91}}}conditions:
field: merchant.risk.score
op: gt
value: 50For arrays of objects, array_any applies a sub-condition with same-element semantics — a record matches only when a single array element satisfies all of the inner conditions at once.
Dictionary encoding
categorical and entity_key columns are dictionary-encoded into int32 IDs. This is why set and equality operators on these columns run as fast integer comparisons rather than string comparisons, and it is part of why the categorical and set operator families are vectorized.
Gotcha: ambiguous fields infer as INT32
Pin ambiguous fields with hintsWhen a field is unbound and its sampled values are ambiguous, inference can land on
INT32— which will reject or mis-handle values you intended as another type. If a field is an identity, a category, a timestamp, or nullable, declare it in thefields:block. The hints above exist precisely to remove this ambiguity, and they make rule loading deterministic across environments.