LRANet++爆誕！画像の文字、秒速キャッチ！🚀✨ （超要約：画像から文字を読み取る技術がスゴくなった話）

Published：2025/11/8 3:08:03

LRANet++爆誕！画像の文字、秒速キャッチ！🚀✨ （超要約：画像から文字を読み取る技術がスゴくなった話）

ギャル的キラキラポイント✨

● 文字の形が複雑でも大丈夫！低ランク近似（LRA）で、どんなフォントにも対応しちゃう！ ● 処理速度が爆上がり！トリプルアサインメント検出ヘッドで、爆速キャッチ＆認識！🚀 ● AIが賢くなった！Transformer使って、画像全体を理解して正確に文字を読み取るよ！📖

詳細解説

● 背景画像から文字を読み取る技術（シーンテキストスポッティング）って、色んな場面で役立つじゃん？例えば、看板とか、書類とか。でも、精度と速さの両立が難しかったんだよね～。

● 方法低ランク近似（LRA）ってテクニックを使って、文字の形をめっちゃ効率的に表現できるようにしたんだ！あと、トリプルアサインメント検出ヘッドっていうので、処理を速くしたんだって。賢いAI（Transformer）も使ってるから、精度もバッチリ！

● 結果このLRANet++、文字の検出と認識の精度がめっちゃ上がったみたい！しかも、処理速度も速くなったから、リアルタイムで使えるレベル！✨

続きは「らくらく論文」アプリで

LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

Yuchen Su / Zhineng Chen / Yongkun Du / Zuxuan Wu / Hongtao Xie / Yu-Gang Jiang

End-to-end text spotting aims to jointly optimize text detection and recognition within a unified framework. Despite significant progress, designing an accurate and efficient end-to-end text spotter for arbitrary-shaped text remains largely unsolved. We identify the primary bottleneck as the lack of a reliable and efficient text detection method. To address this, we propose a novel parameterized text shape method based on low-rank approximation for precise detection and a triple assignment detection head to enable fast inference. Specifically, unlike other shape representation methods that employ data-irrelevant parameterization, our data-driven approach derives a low-rank subspace directly from labeled text boundaries. To ensure this process is robust against the inherent annotation noise in this data, we utilize a specialized recovery method based on an $\ell_1$-norm formulation, which accurately reconstructs the text shape with only a few key orthogonal vectors. By exploiting the inherent shape correlation among different text contours, our method achieves consistency and compactness in shape representation. Next, the triple assignment scheme introduces a novel architecture where a deep sparse branch (for stabilized training) is used to guide the learning of an ultra-lightweight sparse branch (for accelerated inference), while a dense branch provides rich parallel supervision. Building upon these advancements, we integrate the enhanced detection module with a lightweight recognition branch to form an end-to-end text spotting framework, termed LRANet++, capable of accurately and efficiently spotting arbitrary-shaped text. Extensive experiments on several challenging benchmarks demonstrate the superiority of LRANet++ compared to state-of-the-art methods. Code will be available at: https://github.com/ychensu/LRANet-PP.git

cs / cs.CV

Arxivで見る