- text chunks - result specific part instead of the whole document
- chunks with metadata
- source document, section title, author, page number, …
- neural search (information retrieval)
- “text into color”
- query into color spectrum and take closest chunks of text
- only now we can use one-to-one comparison (previously not feasible due to the amount of information) - re-ranking
- challenges
- retrieval is critical
- wrong / more chunks / bigger noise -> hallucinations
- issues:
- not enough / too may documents retrieval
- weak query formulation
- poor chunking
- sensitive & hard to evaluate
- multi-hop questions