SceneComplete: オープンワールド3Dシーン完遂によるロボット操作

Published：2025/11/7 20:09:50

タイトル & 超要約：SceneComplete！ロボットの未来を切り開く魔法🤖✨

● ギャルが注目！オープンワールド（色んな物がある場所）をロボットが理解できるってスゴくない？💖 ● 3Dモデルで物体の場所を特定して、安全に掴んだり動かしたりできるんだって！🎉 ● 倉庫とか医療現場での活躍も期待できるなんて、未来が楽しみすぎ～😍

詳細解説 背景ロボットが色んな場所で活躍するためには、周りの状況をちゃんと理解することが大事なの！🥺 でも、現実世界はゴチャゴチャしてて、ロボットが全部把握するのって難しかったんだよね💦

方法そこで登場！SceneCompleteは、1枚のRGB-D画像（色の情報と距離の情報）から、周りの環境の3Dモデルを作っちゃうんだって！👀 しかも、深層学習モデル（AIの頭脳🧠）を使って、色んな物体を認識して、ロボットが掴んだり避けたりできるようにするんだって！✨

結果 SceneCompleteを使うと、ロボットが色んな物をつかんだり、ぶつからないように動けるようになったみたい！😳 今までの技術より、ずっと精度が上がったんだって！👏

続きは「らくらく論文」アプリで

SceneComplete: Open-World 3D Scene Completion in Cluttered Real World Environments for Robot Manipulation

Aditya Agarwal / Gaurav Singh / Bipasha Sen / Tom\'as Lozano-P\'erez / Leslie Pack Kaelbling

Careful robot manipulation in every-day cluttered environments requires an accurate understanding of the 3D scene, in order to grasp and place objects stably and reliably and to avoid colliding with other objects. In general, we must construct such a 3D interpretation of a complex scene based on limited input, such as a single RGB-D image. We describe SceneComplete, a system for constructing a complete, segmented, 3D model of a scene from a single view. SceneComplete is a novel pipeline for composing general-purpose pretrained perception modules (vision-language, segmentation, image-inpainting, image-to-3D, visual-descriptors and pose-estimation) to obtain highly accurate results. We demonstrate its accuracy and effectiveness with respect to ground-truth models in a large benchmark dataset and show that its accurate whole-object reconstruction enables robust grasp proposal generation, including for a dexterous hand. We release the code and additional results on our website.

cs / cs.RO

Arxivで見る