少 shot アクション認識のための階層的関係拡張表現一般化

Published：2026/1/4 14:02:29

Few-shotアクション認識、爆誕！🎉（超要約：少量データで行動理解をレベルUP！）

1. 超要約 少ないデータで色んな行動を認識するAI、HR2G-shotがスゴいって話✨

2. ギャル的キラキラポイント✨

● 3つの関係性をモデル化！フレーム内、動画間、タスク間を全部まとめて学習しちゃうの💖
● 知識バンクを活用！過去のデータから賢く学び、新しい行動にも対応できるってコト👏
● 計算量も控えめ！他のAIより賢くて、お財布にも優しいなんて最高じゃん？💰

3. 詳細解説

続きは「らくらく論文」アプリで

Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition

Hongyu Qu / Ling Xing / Jiachao Zhang / Rui Yan / Yazhou Yao / Xiangbo Shu

Few-shot action recognition (FSAR) aims to recognize novel action categories with few exemplars. Existing methods typically learn frame-level representations for each video by designing inter-frame temporal modeling strategies or inter-video interaction at the coarse video-level granularity. However, they treat each episode task in isolation and neglect fine-grained temporal relation modeling between videos, thus failing to capture shared fine-grained temporal patterns across videos and reuse temporal knowledge from historical tasks. In light of this, we propose HR2G-shot, a Hierarchical Relation-augmented Representation Generalization framework for FSAR, which unifies three types of relation modeling (inter-frame, inter-video, and inter-task) to learn task-specific temporal patterns from a holistic view. Going beyond conducting inter-frame temporal interactions, we further devise two components to respectively explore inter-video and inter-task relationships: i) Inter-video Semantic Correlation (ISC) performs cross-video frame-level interactions in a fine-grained manner, thereby capturing task-specific query features and enhancing both intra-class consistency and inter-class separability; ii) Inter-task Knowledge Transfer (IKT) retrieves and aggregates relevant temporal knowledge from the bank, which stores diverse temporal patterns from historical episode tasks. Extensive experiments on five benchmarks show that HR2G-shot outperforms current top-leading FSAR methods.

cs / cs.CV

Arxivで見る