iconLogo
Published:2025/10/23 9:06:07

RAGの多文書処理、課題をギャルが解説💖

超要約: RAGで文書が増えると精度落ちる問題を研究!Qwen2.5が優秀だって✨

ギャル的キラキラポイント✨

● RAG(検索拡張生成)って、LLMが賢くなる魔法🪄みたいな技術のこと! ● 文書が増えると精度が落ちる原因を、実験でハッキリさせたのがスゴイ!🧐 ● Qwen2.5っていうLLMが、多文書処理に強いって判明したんだって!😳

詳細解説

続きは「らくらく論文」アプリで

More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG

Shahar Levy / Nir Mazor / Lihi Shalmon / Michael Hassid / Gabriel Stanovsky

Retrieval-Augmented Generation (RAG) enhances the accuracy of Large Language Model (LLM) responses by leveraging relevant external documents during generation. Although previous studies noted that retrieving many documents can degrade performance, they did not isolate how the quantity of documents affects performance while controlling for context length. We evaluate various language models on custom datasets derived from a multi-hop QA task. We keep the context length and position of relevant information constant while varying the number of documents, and find that increasing the document count in RAG settings poses significant challenges for most LLMs, reducing performance by up to 20%. However, Qwen2.5 maintained consistent results across increasing document counts, indicating better multi-document handling capability. Finally, our results indicate that processing multiple documents is a separate challenge from handling long contexts. We also make the datasets and code available: https://github.com/shaharl6000/MoreDocsSameLen .

cs / cs.CL