超ハイスペックな手話動画生成AI「Stable Signer」が、IT業界に革命を起こす予感! 聴覚障碍者(ちょうかくしょうがいしゃ)向けのサービスを、もっと身近にするチャンスだよ💖
✨ ギャル的キラキラポイント ✨ ● 難しい工程(パイプライン)をバッサリ簡略化!処理が超スムーズになったよ😎 ● テキストから手話への変換精度が爆上がり⤴️動画のクオリティも神レベル! ● AIが手話の勉強もサポート!学習効率もUPしちゃうかも?🙌
🌟 詳細解説 🌟 ● 背景 手話動画を作るのって大変だったの!テキストを翻訳して、ポーズを作って…って、工程が多すぎたんだよね💦 それが原因で、動画のクオリティがイマイチだったり😥
● 方法 Stable Signerは、その工程をぎゅっと凝縮(ぎょうしゅく)! テキストから直接、高品質な手話動画を生成できるようにしたんだって💖 独自技術で、変換の精度も爆上がり⤴️
続きは「らくらく論文」アプリで
Sign Language Production (SLP) is the process of converting the complex input text into a real video. Most previous works focused on the Text2Gloss, Gloss2Pose, Pose2Vid stages, and some concentrated on Prompt2Gloss and Text2Avatar stages. However, this field has made slow progress due to the inaccuracy of text conversion, pose generation, and the rendering of poses into real human videos in these stages, resulting in gradually accumulating errors. Therefore, in this paper, we streamline the traditional redundant structure, simplify and optimize the task objective, and design a new sign language generative model called Stable Signer. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid, and executes text understanding through our proposed new Sign Language Understanding Linker called SLUL, and generates hand gestures through the named SLP-MoE hand gesture rendering expert block to end-to-end generate high-quality and multi-style sign language videos. SLUL is trained using the newly developed Semantic-Aware Gloss Masking Loss (SAGM Loss). Its performance has improved by 48.6% compared to the current SOTA generation methods.