Published：2025/10/23 6:52:53

最強ギャルAIが解説！GMFVADで動画異常検知、爆上がり💖

タイトル & 超要約 GMFVAD（ジーエムエフバッド）で動画のヘンなとこ見つけるよ！動画とテキストを合体させて、もっと賢く異常をキャッチするんだって！✨
ギャル的キラキラポイント ● 動画と説明文（テキスト）を合体させるマルチモーダルってのがアツい🔥 ● ただ合体じゃなくて、粒度（グルド）を調整して無駄を省いてるのが賢い！ ● セキュリティとか、色んな分野で役立つ未来が想像できるじゃん？🥹
詳細解説
- 背景監視カメラの動画で「あれ？なんか変！」を見つける技術、VAD（ブイエーディー）っていうんだけど、従来のやり方じゃ見つけにくい異常がいっぱいあったの😭 動画だけじゃなくて、テキスト情報も使って、もっと賢く異常を見つけようって研究だよ！
- 方法動画を分析する「Glance-Focus Network」と、テキストを理解する「SwinBERT」っていうAIを合体！😎 動画のどこが重要か見つけて、テキストと合わせて、無駄を省いた特徴量（トクチョウリョウ）を作るんだって！時間的な変化も考慮して、精度をアップ⤴︎
- 結果他のやり方より、GMFVADの方が異常を見つけるのが上手だったって！👏 データ分析とかセキュリティとか、色んな分野で使えるからすごいよね！
- 意義（ここがヤバい♡ポイント） 人手不足の解消にも繋がるし、安全な社会を作るのにも貢献できるって最高じゃん？✨ 誤検知（まちがって異常と判断しちゃうこと）も減るから、安心して使えるようになるね！
リアルでの使いみちアイデア
- コンビニで万引き（まんびき）してる人を自動で発見！店員さんに教えてあげれる！
- 工場のラインで、不良品（フリョウヒン）をすぐに見つけて、品質アップ！

続きは「らくらく論文」アプリで

GMFVAD: Using Grained Multi-modal Feature to Improve Video Anomaly Detection

Guangyu Dai / Dong Chen / Siliang Tang / Yueting Zhuang

Video anomaly detection (VAD) is a challenging task that detects anomalous frames in continuous surveillance videos. Most previous work utilizes the spatio-temporal correlation of visual features to distinguish whether there are abnormalities in video snippets. Recently, some works attempt to introduce multi-modal information, like text feature, to enhance the results of video anomaly detection. However, these works merely incorporate text features into video snippets in a coarse manner, overlooking the significant amount of redundant information that may exist within the video snippets. Therefore, we propose to leverage the diversity among multi-modal information to further refine the extracted features, reducing the redundancy in visual features, and we propose Grained Multi-modal Feature for Video Anomaly Detection (GMFVAD). Specifically, we generate more grained multi-modal feature based on the video snippet, which summarizes the main content, and text features based on the captions of original video will be introduced to further enhance the visual features of highlighted portions. Experiments show that the proposed GMFVAD achieves state-of-the-art performance on four mainly datasets. Ablation experiments also validate that the improvement of GMFVAD is due to the reduction of redundant information.

cs / cs.CV / cs.MM

Arxivで見る