テキストで空間認識をテスト！

Published：2026/1/7 5:13:52

タイトル & 超要約：テキストで空間認識をテスト！

🌟 ギャル的キラキラポイント✨ ● LLM（エルエルエム）ちゃんが、テキストだけで空間のこと理解できるか試す研究だよ！ ● SiT-Bench（シートベンチ）っていう新しいテストで、すごい詳細に評価するんだって！ ● ロボットとかVR（ブイアール）とか、未来の技術がもっと進化するかもって期待しちゃう💖

詳細解説 ● 背景最近のLLMはすごいけど、目で見てるわけじゃないのに空間のこと、どこまでわかるのかな？🤔 これをちゃんと知りたいから研究が始まったんだって！視覚情報なしで、どこまでできるのか知りたいみたい。

● 方法 SiT-Benchっていうテストを使って、LLMの空間理解度を測るよ！テキストだけで、「右にリンゴがあって、左に本がある」みたいな情報を与えて、ちゃんと理解できるか試すんだって。めっちゃ細かいテスト項目がいっぱいあるみたい！

● 結果この研究で、LLMの空間認識能力が、視覚情報に頼らないでどの程度発揮されるのか、詳しくわかるようになるはず！LLMの強みと課題が見えてくるんじゃない？✨

続きは「らくらく論文」アプリで

Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual Descriptions

Zhongbin Guo / Zhen Yang / Yushan Li / Xinyue Zhang / Wenyu Gao / Jiacheng Wang / Chengzhi Li / Xiangrui Liu / Ping Jian

Recent advancements in Spatial Intelligence (SI) have predominantly relied on Vision-Language Models (VLMs), yet a critical question remains: does spatial understanding originate from visual encoders or the fundamental reasoning backbone? Inspired by this question, we introduce SiT-Bench, a novel benchmark designed to evaluate the SI performance of Large Language Models (LLMs) without pixel-level input, comprises over 3,800 expert-annotated items across five primary categories and 17 subtasks, ranging from egocentric navigation and perspective transformation to fine-grained robotic manipulation. By converting single/multi-view scenes into high-fidelity, coordinate-aware textual descriptions, we challenge LLMs to perform symbolic textual reasoning rather than visual pattern matching. Evaluation results of state-of-the-art (SOTA) LLMs reveals that while models achieve proficiency in localized semantic tasks, a significant "spatial gap" remains in global consistency. Notably, we find that explicit spatial reasoning significantly boosts performance, suggesting that LLMs possess latent world-modeling potential. Our proposed dataset SiT-Bench serves as a foundational resource to foster the development of spatially-grounded LLM backbones for future VLMs and embodied agents. Our code and benchmark will be released at https://github.com/binisalegend/SiT-Bench .

cs / cs.CV / cs.AI

Arxivで見る