iconLogo
Published:2026/1/7 6:38:34

LLMコード生成、データ転記の落とし穴💦(超要約:コピペ下手っぴLLMをあぶり出す!)

1. 超要約 LLM(大規模言語モデル)って、コード書くのは得意だけど、データ丸写しは苦手なのよ~! その弱点を暴く研究だよ☆

2. ギャル的キラキラポイント✨

  • ● コード生成AIの弱点暴露! データ転記が苦手って、意外~?
  • ● セキュリティ(暗号とか)に関わるコードで、LLMの間違いは命取り!
  • ● 信頼性アップのため、LLMの弱点を克服する研究って、マジすごい!

3. 詳細解説

続きは「らくらく論文」アプリで

Verbatim Data Transcription Failures in LLM Code Generation: A State-Tracking Stress Test

Mohd Ariful Haque / Kishor Datta Gupta / Mohammad Ashiqur Rahman / Roy George

Many real-world software tasks require exact transcription of provided data into code, such as cryptographic constants, protocol test vectors, allowlists, and calibration tables. These tasks are operationally sensitive because small omissions or alterations can remain silent while producing syntactically valid programs. This paper introduces a deliberately minimal transcription-to-code benchmark to isolate this reliability concern in LLM-based code generation. Given a list of high-precision decimal constants, a model must generate Python code that embeds the constants verbatim and performs a simple aggregate computation. We describe the prompting variants, evaluation protocol based on exact-string inclusion, and analysis framework used to characterize state-tracking and long-horizon generation failures. The benchmark is intended as a compact stress test that complements existing code-generation evaluations by focusing on data integrity rather than algorithmic reasoning.

cs / cs.SE / cs.CR