iconLogo
Published:2026/1/1 21:40:37

タイトル & 超要約:動的要素が都市をアゲる?MLLMで知覚を解析✨

ギャル的キラキラポイント✨ ● 都市の見た目、つまり景観(けいかん)にいる人とか車とかの動きが、私たちの感じ方にどんだけ影響(えいきょう)あるかを研究してるんだって! ● AIを使って、人や車の動きをちょい足ししたり消したりした画像を作って、みんながどう感じるかを調べてるんだって!すごい! ● この研究の結果、もっと素敵な街づくりとか、新しいサービスとかが生まれるかもって期待できるんだって!ワクワクするね♪

詳細解説 ● 背景 都会の景色って、私たちにどんな感情(かんじょう)を与えるか、すごく大事じゃん?💖 でも、今までの研究は、風景を静止画(せいしが)みたいに見てたんだよね🥺 でも実際は、人や車が動き回ってて、それがめっちゃ重要だってことに気づいたってわけ!

● 方法 最新AIの「MLLM」っていう、いろんな情報を組み合わせられるスゴイやつを使って、街の画像を加工したんだって!🚶‍♀️🚗 例えば、人にっこり笑ってる顔を足したり、車を消したりして、みんながどう感じるか実験したみたい!

● 結果 動く要素(要素=もの)があることで、街の印象が大きく変わることが分かったみたい!✨ 例えば、人がたくさんいると活気(かっき)を感じたり、車が多いとちょっと騒がしいとか、そういうのを数値化(すうちか)できたみたいだよ!

続きは「らくらく論文」アプリで

From Static to Dynamic: Evaluating the Perceptual Impact of Dynamic Elements in Urban Scenes via MLLM-Guided Generative Inpainting

Zhiwei Wei / Mengzi Zhang / Boyan Lu / Zhitao Deng / Nai Yang / Hua Liao

Understanding urban perception from street view imagery has become a central topic in urban analytics and human centered urban design. However, most existing studies treat urban scenes as static and largely ignore the role of dynamic elements such as pedestrians and vehicles, raising concerns about potential bias in perception based urban analysis. To address this issue, we propose a controlled framework that isolates the perceptual effects of dynamic elements by constructing paired street view images with and without pedestrians and vehicles using semantic segmentation and MLLM guided generative inpainting. Based on 720 paired images from Dongguan, China, a perception experiment was conducted in which participants evaluated original and edited scenes across six perceptual dimensions. The results indicate that removing dynamic elements leads to a consistent 30.97% decrease in perceived vibrancy, whereas changes in other dimensions are more moderate and heterogeneous. To further explore the underlying mechanisms, we trained 11 machine learning models using multimodal visual features and identified that lighting conditions, human presence, and depth variation were key factors driving perceptual change. At the individual level, 65% of participants exhibited significant vibrancy changes, compared with 35-50% for other dimensions; gender further showed a marginal moderating effect on safety perception. Beyond controlled experiments, the trained model was extended to a city-scale dataset to predict vibrancy changes after the removal of dynamic elements. The city level results reveal that such perceptual changes are widespread and spatially structured, affecting 73.7% of locations and 32.1% of images, suggesting that urban perception assessments based solely on static imagery may substantially underestimate urban liveliness.

cs / cs.CY