Comparison

How Infino compares
to the alternatives.

Today, full-text and vector workloads run in cluster-managed search engines and vector databases — proprietary formats, separate copies of your data, separate compute, separate ops. Infino embeds BM25 and vector indexes inside a valid Apache Parquet file. One copy, one format, on object storage.

Capability matrix

Side by side.

Infino is built for retrieval, not transactions. Schema evolution and time travel are supported, with the focus on lower compute, data, and token costs at scale — hybrid search and vectors native to Parquet, instead of ACID.

	Iceberg / Hudi / Delta	Elasticsearch / OpenSearch	Vector databases	Infino
On-disk format	Apache Parquet on object storage. Standard format every analytics engine can read.	Proprietary inverted-index segment files in a managed cluster.	Proprietary vector index files (HNSW graphs, IVF shards) in a managed cluster.	One file = a valid Apache Parquet file plus embedded BM25 and vector indexes. Same Parquet bytes any engine already reads.
Full-text search	Not in scope. The format has no inverted index, no BM25, no posting lists.	First-class. BM25 is the engine.	Not in scope, or a bolted-on keyword filter alongside the vector index.	First-class. PFOR-delta posting lists, FST term dictionary, BlockMaxWAND + MaxScore-BMM execution — all embedded in the Parquet file.
Vector / ANN	Not in scope. Stored as f32 columns at best; no index, no ANN engine.	Add-on (k-NN plugin, dense_vector field). Separate index alongside text.	First-class. The whole product.	First-class. IVF + 1-bit RaBitQ + full-precision rerank, cluster-contiguous on disk so a probe is one range GET.
SQL / analytics	Native. DataFusion, DuckDB, Spark, Trino all read it.	Translation layer over query DSL. Limited optimizer.	Limited. A vector-search API, not a SQL engine. Joins and analytics happen elsewhere.	Native. The bytes are still Parquet — DataFusion / DuckDB / pyarrow read it as a normal columnar table.
Storage model	Object storage. Storage and compute decoupled.	Cluster-managed disks. Data lives inside the engine.	Cluster-managed disks (or a managed service). Data lives inside the engine.	Object storage. Storage and compute decoupled. One single-RTT cold path: footer + at most one FTS region + at most one vector cluster region.
Format compatibility	Standard Parquet. Read by any analytics engine.	Engine-specific segment files. Only the engine reads them.	Engine-specific index files. Only the engine reads them.	Standard Apache Parquet bytes. Other engines read it as a normal columnar table.

Format

Same Parquet, but optimized for agents.

Agents have fundamentally different query patterns from humans, with implications for storage architectures. On the left below, a typical stack relies on separate search and vector systems alongside a columnar table format, forcing agent pipelines to join results across multiple stores. On the right, Infino embeds both indexes directly into the Parquet file, so vector, full-text, and SQL retrieval execute against the same bytes. The Infino reader is built on DataFusion, enabling hybrid queries from agents without maintaining separate search or vector infrastructure.

Columnar table format + sidecar search

Query

Manifest list

Manifest

Parquet footer

Column chunks

Sidecar inverted index

Sidecar vector index

RTT

External search cluster

The table format and the search system are two different things. A hybrid query has to talk to both, then merge results client-side.

Infino — search inside Parquet

Query

Manifest · lightweight catalog (out-of-band)

Parquet file (data + indexes)

Row groups

Scalar columns + multi-media blobs

Inverted index

FST + posting blocks

Vector index

IVF + RaBitQ + f32

Footer

KV pointers to indexes

The table and its indexes are the same object. Any Parquet engine still reads it as a columnar table; an Infino-aware reader uses the embedded indexes to answer hybrid search against the same bytes — no second store, no merge.

Why this layout

What you get when search lives with the data.

01 · Locality

Fewer object-store calls.

A hybrid query becomes a footer read plus a handful of byte ranges against one object. No manifest tree to walk, no sidecar files to chase, no second cluster to call across the network.

02 · Pruning

Skip before you read.

The footer carries column statistics and pointers to the indexes. Whole files, row groups, posting blocks, and vector clusters are eliminated from the plan before any data bytes are fetched.

03 · One copy

No mirror to keep in sync.

The Parquet file is the table, the inverted index, and the vector store. Nothing to ETL into Elasticsearch, nothing to double-write to a vector database, no reconciliation lag.

Inverted index

BM25, with the brakes off.

Lucene-style BM25 (k1=1.2, b=0.75) over an FST term dictionary and PFOR-delta posting lists in 128-doc blocks.

Per-block BM25 ceilings drive Block-Max-MaxScore and BMM dispatch — multi-term OR queries skip whole blocks that cannot beat the current top-k threshold, so most postings are never decoded.

Vector index

ANN, range-read native.

IVF clustering with 1-bit RaBitQ codes for 32× compression, plus full f32 vectors for a precise rerank pass.

Codes, vectors, and doc ids are laid out cluster-contiguous on disk, so probing K clusters costs K contiguous range reads — no graph walk, no per-doc pointer chasing. Default tuning recovers recall@10 ≥ 0.90.

See the numbers.

Benchmarks Talk to engineering