Vector Similarity

Compare a record's embedding against a reference vector with cosine, L2, or dot-product distance.

The vector_distance operator builds a per-row vector out of named component fields, measures its distance to a constant reference vector, and compares that distance with an op and value. Use it to flag records whose embedding is close to (or far from) a known point.

The vector_distance leaf

- vector_distance:
    dims: [embedding_0, embedding_1, embedding_2]
    metric: cosine
    reference: [0.1, 0.2, 0.3]
    op: gt
    value: 0.8

Field reference

FieldPurpose
dimsThe list of per-component field names that form the vector — one field per dimension, not a single array field.
metricThe distance metric: cosine, l2, or dot.
referenceThe constant reference vector. Its length must match the number of dims.
opComparison operator applied to the computed distance.
valueThe threshold to compare against.

Metrics

  • cosine — cosine distance between the row vector and the reference.
  • l2 — Euclidean (straight-line) distance.
  • dot — dot product of the two vectors.

The kernels are vectorized with NEON (the ARM SIMD instruction set; SIMD = single instruction, multiple data) on Apple Silicon. Unlike a Window, vector_distance is stateless — it depends only on the current row, with no cross-batch history.

🚧

This is not a nearest-neighbor index

vector_distance measures each row's distance to one constant reference vector. It is not an ANN (approximate nearest neighbor) index and does not search a corpus of vectors. For nearest-neighbor search over a large embedding collection, use a dedicated vector database.

Where to go next