超要約:点群と写真のイイトコ取り!ロボが賢く動ける魔法🧙♀️
💎 ギャル的キラキラポイント✨ ● 点群(3Dデータ)を2Dマップに変身!既存技術が使えるように💖 ● RGB画像(写真)と合体で、ロボの認識力が爆上がり⤴ ● 実証実験で、他の方法よりスゴイって証明済み!
詳細解説いくよ~!
背景 ロボットが周りの状況を理解するのって難しいの。写真だけだと立体感が足りないし、3Dデータ(点群)は処理が大変💦 そこで、両方の良いとこ取りをしたのがPMPなんだ!
続きは「らくらく論文」アプリで
Robotic manipulation systems benefit from complementary sensing modalities, where each provides unique environmental information. Point clouds capture detailed geometric structure, while RGB images provide rich semantic context. Current point cloud methods struggle to capture fine-grained detail, especially for complex tasks, which RGB methods lack geometric awareness, which hinders their precision and generalization. We introduce PointMapPolicy, a novel approach that conditions diffusion policies on structured grids of points without downsampling. The resulting data type makes it easier to extract shape and spatial relationships from observations, and can be transformed between reference frames. Yet due to their structure in a regular grid, we enable the use of established computer vision techniques directly to 3D data. Using xLSTM as a backbone, our model efficiently fuses the point maps with RGB data for enhanced multi-modal perception. Through extensive experiments on the RoboCasa and CALVIN benchmarks and real robot evaluations, we demonstrate that our method achieves state-of-the-art performance across diverse manipulation tasks. The overview and demos are available on our project page: https://point-map.github.io/Point-Map/