検索爆速化🚀!クエリ(検索ワード)の偏りに合わせて賢く検索する技術で、サービスがもっと使いやすくなるってこと💖
● 検索ワードの「人気度」を考慮して、検索スピードUP!🌟 ● 検索結果が、まるで自分の好みを分かってるみたいにピッタリ!💖 ● 新しい検索ワードにも、すぐさま対応できるから最強👍
背景 世の中のデータは、検索される頻度(クエリ頻度)に偏りがあるのが当たり前じゃん?🙄 例えば、新作コスメはみんな検索するけど、マニアックな情報はそうでもない、みたいな。従来の検索エンジンは、この偏りをあんまり考慮してなかったから、遅くなったり、イマイチな結果になったりしてたんだよね😭
方法 この研究では、クエリ頻度の高いデータと低いデータを分けて管理する「デュアルインデックス構造」を採用!😲 さらに、検索パターンを分析して、高頻度クエリは「ホットインデックス」、低頻度クエリは「フルインデックス」から検索するようにしたんだって!✨ 検索パターンに合わせて、検索方法を切り替えることで、爆速検索を実現してるんだね!🚀
続きは「らくらく論文」アプリで
Approximate Nearest Neighbor Search (ANNS) is a crucial operation in databases and artificial intelligence. While graph-based ANNS methods like HNSW and NSG excel in performance, they assume uniform query distribution. However, in real-world scenarios, user preferences and temporal dynamics often result in certain data points being queried more frequently than others, and these query patterns can change over time. To better leverage such characteristics, we propose DQF, a novel Dual-Index Query Framework. This framework features a dual-layer index structure and a dynamic search strategy based on a decision tree. The dual-layer index includes a hot index for high-frequency nodes and a full index covering the entire dataset, allowing for the separate management of hot and cold queries. Furthermore, we propose a dynamic search strategy that employs a decision tree to determine whether a query is of the high-frequency type, avoiding unnecessary searches in the full index through early termination. Additionally, to address fluctuations in query frequency, we design an update mechanism to manage the hot index. New high-frequency nodes will be inserted into the hot index, which is periodically rebuilt when its size exceeds a predefined threshold, removing outdated low-frequency nodes. Experiments on four real-world datasets demonstrate that the Dual-Index Query Framework achieves a significant speedup of 2.0-5.7x over state-of-the-art algorithms while maintaining a 95% recall rate. Importantly, it avoids full index reconstruction even as query distributions change, underscoring its efficiency and practicality in dynamic query distribution scenarios.