iconLogo
Published:2025/12/16 3:55:55

音と映像でカメラ位置特定!VRとか色んな未来がアガる研究だよ☆(超要約)

ギャル的キラキラポイント✨

● 音(環境音)と映像を組み合わせて、カメラの位置を特定するってのが斬新! ● 暗いとこやブレとか、映像だけじゃムリな状況でも精度UP! ● AR/VRとかロボ、自動運転…色んな分野で活躍できる予感💖

詳細解説

背景

続きは「らくらく論文」アプリで

Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video

Daniel Adebi / Sagnik Majumder / Kristen Grauman

Understanding camera motion is a fundamental problem in embodied perception and 3D scene understanding. While visual methods have advanced rapidly, they often struggle under visually degraded conditions such as motion blur or occlusions. In this work, we show that passive scene sounds provide complementary cues for relative camera pose estimation for in-the-wild videos. We introduce a simple but effective audio-visual framework that integrates direction-ofarrival (DOA) spectra and binauralized embeddings into a state-of-the-art vision-only pose estimation model. Our results on two large datasets show consistent gains over strong visual baselines, plus robustness when the visual information is corrupted. To our knowledge, this represents the first work to successfully leverage audio for relative camera pose estimation in real-world videos, and it establishes incidental, everyday audio as an unexpected but promising signal for a classic spatial challenge. Project: http://vision.cs.utexas.edu/projects/av_camera_pose.

cs / cs.CV