iconLogo
Published:2026/1/8 9:26:05

LLMで特許審査が進化?!新ベンチマーク「PILOT-Bench」登場!🚀

超要約:LLM(大規模言語モデル)の特許審査能力を測る新しい試験(ベンチマーク)ができたよ!法律的な思考(推論)ができるか試せるんだって✨

🌟 ギャル的キラキラポイント✨ ● LLMが特許審査に役立つか試せるなんて、未来を感じる〜! ● IRAC(イラック)っていう、法律的な考え方に沿って評価するんだって!頭良さそう😳 ● 特許審査が楽になったり、新しいサービスが生まれるかもって、ワクワクするね!

詳細解説

背景 特許の出願って増えてるじゃん? だから審査する人も大変なんだよね😭 LLM(AI)に手伝ってもらえたら、もっと早く、正確に審査できるかも!って期待されてるんだよね。でも、LLMが本当に法律的な思考できるか、ちゃんと測る方法がなかったんだ💦

方法 そこで登場したのが「PILOT-Bench」! PTAB(米国特許審判委員会)のデータとか、USPTO(米国特許商標庁)の特許データを集めて、LLMが法律的な思考ができるか試すための試験を作ったんだって! IRAC(Issue, Rule, Application, Conclusion)っていう、法律的な考え方に沿って、LLMの能力をチェックするんだよ🎵

続きは「らくらく論文」アプリで

PILOT-Bench: A Benchmark for Legal Reasoning in the Patent Domain with IRAC-Aligned Classification Tasks

Yehoon Jang / Chaewon Lee / Hyun-seok Min / Sungchul Choi

The Patent Trial and Appeal Board (PTAB) of the USPTO adjudicates thousands of ex parte appeals each year, requiring the integration of technical understanding and legal reasoning. While large language models (LLMs) are increasingly applied in patent and legal practice, their use has remained limited to lightweight tasks, with no established means of systematically evaluating their capacity for structured legal reasoning in the patent domain. In this work, we introduce PILOT-Bench, the first PTAB-centric benchmark that aligns PTAB decisions with USPTO patent data at the case-level and formalizes three IRAC-aligned classification tasks: Issue Type, Board Authorities, and Subdecision. We evaluate a diverse set of closed-source (commercial) and open-source LLMs and conduct analyses across multiple perspectives, including input-variation settings, model families, and error tendencies. Notably, on the Issue Type task, closed-source models consistently exceed 0.75 in Micro-F1 score, whereas the strongest open-source model (Qwen-8B) achieves performance around 0.56, highlighting a substantial gap in reasoning capabilities. PILOT-Bench establishes a foundation for the systematic evaluation of patent-domain legal reasoning and points toward future directions for improving LLMs through dataset design and model alignment. All data, code, and benchmark resources are available at https://github.com/TeamLab/pilot-bench.

cs / cs.CL / cs.AI