iconLogo
Published:2026/1/11 1:38:33

FinForge:金融LLM評価フレームワーク✨

超要約:金融LLMを評価する、新しいフレームワークが登場!すごいじゃん?

🌟 ギャル的キラキラポイント ● 金融LLMの評価をレベルアップさせるんだって! ● 専門家(せんもんか)の知識とLLMを組み合わせるのが斬新(ざんしん)! ● 金融業界(ぎょうかい)に革命(かくめい)を起こすかも!

詳細解説 ● 背景 金融LLMって、すごい能力を持ってるけど、評価が難しかったんだよね💦 既存(きぞん)のテストじゃ、金融のこと、ちゃんと評価できないじゃん? だから、FinForgeが必要になったってワケ💖

● 方法 FinForgeは、金融に関する情報から、LLMを評価するための問題を作るんだって! 専門家の知識を使って、LLMがちゃんと金融のこと分かってるかチェックするテストを作るイメージかな?🧐

続きは「らくらく論文」アプリで

FinForge: Semi-Synthetic Financial Benchmark Generation

Glenn Matlin / Akhil Theerthala / Anant Gupta / Anirudh JM / Rayan Castilla / Yi Mei Ng / Sudheer Chava

Evaluating Language Models (LMs) in specialized, high-stakes domains such as finance remains a significant challenge due to the scarcity of open, high-quality, and domain-specific datasets. Existing general-purpose benchmarks provide broad coverage but lack the depth and domain fidelity needed to assess LMs' capabilities for real-world financial reasoning, which requires both conceptual understanding and quantitative rigor. To address this gap, we introduce FinForge, a scalable, semi-synthetic pipeline for constructing finance-specific evaluation benchmarks through a hybrid of expert-guided data curation and controlled LM-based synthesis. FinForge combines manual and programmatic corpus construction from authoritative financial sources with structured question generation and validation using Gemini 2.5 Flash. To demonstrate the pipeline's efficacy, we produce FinForge-5k, a snapshot benchmark comprising over 5,000 human-validated question-answer pairs across 11 finance subdomains, derived from a curated corpus of 100,000 verified documents totaling 143M tokens. Evaluation of state-of-the-art open-source and closed-source models on FinForge-5k reveals significant differences in financial reasoning, with leading models achieving accuracy levels near 80%. These findings underscore the framework's utility for diagnosing current model limitations and guiding future improvements in financial domain competence. All code and data are available at https://github.com/gtfintechlab/FinForge.

cs / cs.AI