T-MASKで視点問題解決！最強ドラテク認識💖

Published：2025/8/25 5:44:22

T-MASKで視点問題解決！最強ドラテク認識💖

超要約: 視点変化に強い、ドライバー行動認識技術だよ！✨

ギャル的キラキラポイント✨ ● カメラアングルが変わっても、行動をしっかり認識できるってすごくない？ ● 最新の画像モデル（画像を見て理解するAI）を駆使してるんだって！賢すぎ！ ● 自動運転とか、色んな分野で活躍できる可能性大だよ～😍

詳細解説

背景ドライバーの行動をAIで認識する技術、めっちゃ重要じゃん？🚗 でもさ、カメラの角度が変わると、AIが「あれ？」ってなっちゃう問題があったの！従来の技術だと、視点が変わると性能が落ちちゃうんだよね💦
方法そこで登場！事前学習済みの画像基盤モデル（Image Foundation Models）だよ！😎 このAIは、色んな画像を見て学習してるから、視点が変わっても大丈夫なの！T-MASKって名前の技術を使って、視点変化に強いドライバー行動認識を実現したんだって！

続きは「らくらく論文」アプリで

T-MASK: Temporal Masking for Probing Foundation Models across Camera Views in Driver Monitoring

Thinesh Thiyakesan Ponbagavathi / Kunyu Peng / Alina Roitberg

Changes of camera perspective are a common obstacle in driver monitoring. While deep learning and pretrained foundation models show strong potential for improved generalization via lightweight adaptation of the final layers ('probing'), their robustness to unseen viewpoints remains underexplored. We study this challenge by adapting image foundation models to driver monitoring using a single training view, and evaluating them directly on unseen perspectives without further adaptation. We benchmark simple linear probes, advanced probing strategies, and compare two foundation models (DINOv2 and CLIP) against parameter-efficient fine-tuning (PEFT) and full fine-tuning. Building on these insights, we introduce T-MASK -- a new image-to-video probing method that leverages temporal token masking and emphasizes more dynamic video regions. Benchmarked on the public Drive&Act dataset, T-MASK improves cross-view top-1 accuracy by $+1.23\%$ over strong probing baselines and $+8.0\%$ over PEFT methods, without adding any parameters. It proves particularly effective for underrepresented secondary activities, boosting recognition by $+5.42\%$ under the trained view and $+1.36\%$ under cross-view settings. This work provides encouraging evidence that adapting foundation models with lightweight probing methods like T-MASK has strong potential in fine-grained driver observation, especially in cross-view and low-data settings. These results highlight the importance of temporal token selection when leveraging foundation models to build robust driver monitoring systems. Code and models will be made available at https://github.com/th-nesh/T-MASK to support ongoing research.

cs / cs.CV

Arxivで見る