PedX-LLM解説！歩行者横断をAIで予測しちゃうぞ☆

Published：2026/1/2 14:13:28

PedX-LLM解説！歩行者横断をAIで予測しちゃうぞ☆

超要約: 歩行者横断をAIで予測！視覚情報と知識を融合だよ～ん💕
ギャル的キラキラポイント✨
- ● 事故を減らすために、AIが歩行者の行動を予測するんだって！
- ● 視覚情報（画像）とテキスト情報を組み合わせて、賢く予測するの💖
- ● 交通ルールとかの知識もAIが持ってるから、すっごい精度なの！
詳細解説
- 背景: 歩行者の事故って多いじゃん？😱 これを減らすために、AIが「歩行者がいつ、どこで道を渡るか」を予測する研究なんだ！今までのAIは、場所が変わると予測が難しかったんだけど…
- 方法: AIに、道の写真（画像）と、歩行者の情報（年齢とか）を教えて、さらに交通ルールとかの知識もインプット！🤖 いろんな情報を組み合わせて、賢く予測できるようにしたんだって！
- 結果: いろんな場所で、高い精度で予測できるようになったんだって！😳 特に、新しい場所でもちゃんと予測できるのがすごい！
- 意義（ここがヤバい♡ポイント）: これで、もっと安全な街づくりができるようになるかも！💕 横断歩道の場所を考えたり、信号の時間を調整したりできるから、すごいよね！
リアルでの使いみちアイデア💡
- 1. 交通安全アプリ: 「この道、危ないよ！」とか教えてくれるアプリがあったら、すごくない？😍
- 2. 自動運転車: 歩行者の動きを予測して、安全に運転できるようにするの！🚕

続きは「らくらく論文」アプリで

A Vision-and-Knowledge Enhanced Large Language Model for Generalizable Pedestrian Crossing Behavior Inference

Qingwen Pu / Kun Xie / Hong Yang / Guocong Zhai

Existing paradigms for inferring pedestrian crossing behavior, ranging from statistical models to supervised learning methods, demonstrate limited generalizability and perform inadequately on new sites. Recent advances in Large Language Models (LLMs) offer a shift from numerical pattern fitting to semantic, context-aware behavioral reasoning, yet existing LLM applications lack domain-specific adaptation and visual context. This study introduces Pedestrian Crossing LLM (PedX-LLM), a vision-and-knowledge enhanced framework designed to transform pedestrian crossing inference from site-specific pattern recognition to generalizable behavioral reasoning. By integrating LLaVA-extracted visual features with textual data and transportation domain knowledge, PedX-LLM fine-tunes a LLaMA-2-7B foundation model via Low-Rank Adaptation (LoRA) to infer crossing decisions. PedX-LLM achieves 82.0% balanced accuracy, outperforming the best statistical and supervised learning methods. Results demonstrate that the vision-augmented module contributes a 2.9% performance gain by capturing the built environment and integrating domain knowledge yields an additional 4.1% improvement. To evaluate generalizability across unseen environments, cross-site validation was conducted using site-based partitioning. The zero-shot PedX-LLM configuration achieves 66.9% balanced accuracy on five unseen test sites, outperforming the baseline data-driven methods by at least 18 percentage points. Incorporating just five validation examples via few-shot learning to PedX-LLM further elevates the balanced accuracy to 72.2%. PedX-LLM demonstrates strong generalizability to unseen scenarios, confirming that vision-and-knowledge-enhanced reasoning enables the model to mimic human-like decision logic and overcome the limitations of purely data-driven methods.

cs / cs.AI

Arxivで見る