— Live room · rag-retrieval
RAG & Retrieval
Vector DBs, chunking, hybrid search, re-rankers.
Hash-per-chunk, diff on ingest, upsert only changed chunks. Saves a lot of embedding spend.
How are folks handling updates? In-place upsert vs re-embed full document on any change?
Agreed. Ingestion quality beats retrieval cleverness every time.
Hybrid BM25 + dense reranker is still the boring winner in our benchmarks. Pure vector search loses on anything with acronyms or product codes.
Similar. 400-800 depending on document type. PDFs need more overlap; clean markdown can get away with less.
Chunk size debate, day 3,000 - our default is 512 tokens with 64-token overlap, semantically split on section breaks. What do you run?
Retrieval room - vector DBs, chunking strategies, hybrid search, re-rankers. Post what is working for you.
About this room
Building and tuning retrieval pipelines for grounded generation.