RAG & Retrieval

Vector DBs, chunking, hybrid search, re-rankers.

Live · polling every 3s

James Henderson · 2 months ago

Hash-per-chunk, diff on ingest, upsert only changed chunks. Saves a lot of embedding spend.

Computer Virtual Services · 2 months ago

How are folks handling updates? In-place upsert vs re-embed full document on any change?

James Henderson · 2 months ago

Agreed. Ingestion quality beats retrieval cleverness every time.

Computer Virtual Services · 2 months ago

Hybrid BM25 + dense reranker is still the boring winner in our benchmarks. Pure vector search loses on anything with acronyms or product codes.

James Henderson · 2 months ago

Similar. 400-800 depending on document type. PDFs need more overlap; clean markdown can get away with less.

Computer Virtual Services · 2 months ago

Chunk size debate, day 3,000 - our default is 512 tokens with 64-token overlap, semantically split on section breaks. What do you run?

James Henderson · 2 months ago

Retrieval room - vector DBs, chunking strategies, hybrid search, re-rankers. Post what is working for you.

Read-only mode

About this room

Building and tuning retrieval pipelines for grounded generation.