TimeSearch-Rで動画理解を爆上げ！🚀

Published：2025/11/7 18:58:25

タイトル & 超要約：TimeSearch-Rで動画理解を爆上げ！🚀

TimeSearch-R、最強説！ ● 長い動画（ちょうちょう尺動画）を賢く理解する、新しい方法なんだって✨ ● 強化学習（きょうかがくしゅう）で、動画のどこを見ればいいか教えてくれるの💖 ● 動画検索とか、Q&A（質問応答）とか、色んなことに使えるからすごい！😎
詳細解説、いくよー！ • 背景：動画がどんどん長くなってて、全部見るのは大変じゃん？😱 必要なとこだけ見つけたい！ • 方法：TimeSearch-Rは、強化学習っていうAIを使って、動画のどこを見ればいいか学んでいくんだって。まるで人間みたいに、必要なとこを探し出すんだね！🤔 • 結果：めっちゃ精度（せいど）上がったらしい！他のAIよりも、動画の内容をちゃんと理解できるようになったみたい💕 • 意義：動画検索とか、色んなことに使えるから、IT業界（ITぎょうかい）がもっと面白くなる予感！🤩
TimeSearch-R、すごいじゃん！ • 動画検索がマジで進化: 知りたい情報がすぐに見つかるようになるから、動画探しが楽々～！😊 • 動画Q&Aが神レベル: 動画の内容について質問したら、正確な答えが返ってくるようになるかも！賢い～💖 • IT業界がワクワク: 新しいサービスが生まれて、もっと楽しい世界になるかもね！🌈
リアルで使える！使いみちアイデア • 動画編集アプリ: 質問するだけで、必要なシーンをAIが見つけてくれるアプリとか最高じゃない？✨ • 教育アプリ: 授業動画を見てて分かんないこと、AIが教えてくれるアプリがあったら、勉強も楽しくなるよね！💡

続きは「らくらく論文」アプリで

TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning

Junwen Pan / Qizhe Zhang / Rui Zhang / Ming Lu / Xin Wan / Yuan Zhang / Chang Liu / Qi She

Temporal search aims to identify a minimal set of relevant frames from tens of thousands based on a given query, serving as a foundation for accurate long-form video understanding. Existing works attempt to progressively narrow the search space. However, these approaches typically rely on a hand-crafted search process, lacking end-to-end optimization for learning optimal search strategies. In this paper, we propose TimeSearch-R, which reformulates temporal search as interleaved text-video thinking, seamlessly integrating searching video clips into the reasoning process through reinforcement learning (RL). However, applying RL training methods, such as Group Relative Policy Optimization (GRPO), to video reasoning can result in unsupervised intermediate search decisions. This leads to insufficient exploration of the video content and inconsistent logical reasoning. To address these issues, we introduce GRPO with Completeness Self-Verification (GRPO-CSV), which gathers searched video frames from the interleaved reasoning process and utilizes the same policy model to verify the adequacy of searched frames, thereby improving the completeness of video reasoning. Additionally, we construct datasets specifically designed for the SFT cold-start and RL training of GRPO-CSV, filtering out samples with weak temporal dependencies to enhance task difficulty and improve temporal search capabilities. Extensive experiments demonstrate that TimeSearch-R achieves significant improvements on temporal search benchmarks such as Haystack-LVBench and Haystack-Ego4D, as well as long-form video understanding benchmarks like VideoMME and MLVU. Notably, TimeSearch-R establishes a new state-of-the-art on LongVideoBench with 4.1% improvement over the base model Qwen2.5-VL and 2.0% over the advanced video reasoning model Video-R1. Our code is available at https://github.com/Time-Search/TimeSearch-R.

cs / cs.CV / cs.AI

Arxivで見る