LLMのコード、実は粗悪品！？技術的負債を暴く！💅

Published：2025/12/3 21:24:02

LLMのコード、実は粗悪品！？技術的負債を暴く！💅

I. 研究の概要 LLM（大規模言語モデル）で作ったコードの質を調査！技術的負債（後々困ること）を可視化✨

II. 研究の詳細

ギャル的キラキラポイント✨ その1: LLMが作ったコードの「アーキテクチャ違反」を発見🔎 独自の指標「幻覚的結合（Hallucinated Coupling）」を定義！
ギャル的キラキラポイント✨ その2: GPT-5.1とか、Llama 3とかのモデルでHexagonal Architecture（六角形アーキテクチャ）ってやつを試したらしい！アーキテクチャの質を比較したんだって！
ギャル的キラキラポイント✨ その3: オープンソースLLM（Llama 3）は、商用のLLM（GPT-5.1とか）よりアーキテクチャの質が低いことを発見！ヤバくない？😱
詳細解説
- 背景: LLMはコードを書くのが得意になってきたけど、そのコードが将来的にメンテナンスしにくい（技術的負債がある）って問題があったんだって。
- 方法: Hexagonal Architectureっていう、キレイな設計方法に沿って、LLMにコードを作らせて、アーキテクチャのルールを守ってるかをチェックしたよ！
- 結果: Llama 3は、依存関係がめちゃくちゃになったり、実装が雑だったりして、アーキテクチャ違反が多かったみたい😂
- 意義（ここがヤバい♡ポイント）: LLMで作ったコードの質を客観的に評価する指標を作ったことで、開発者は技術的負債に気づきやすくなった！将来、困らないようにできるってこと！

続きは「らくらく論文」アプリで

Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

Tyler Slater

As Large Language Models (LLMs) transition from code completion tools to autonomous system architects, their impact on long-term software maintainability remains unquantified. While existing research benchmarks functional correctness (pass@k), this study presents the first empirical framework to measure "Architectural Erosion" and the accumulation of Technical Debt in AI-synthesized microservices. We conducted a comparative pilot study of three state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet, and Llama 3 8B) by prompting them to implement a standardized Book Lending Microservice under strict Hexagonal Architecture constraints. Utilizing Abstract Syntax Tree (AST) parsing, we find that while proprietary models achieve high architectural conformance (0% violation rate for GPT-5.1), open-weights models exhibit critical divergence. Specifically, Llama 3 demonstrated an 80% Architectural Violation Rate, frequently bypassing interface adapters to create illegal circular dependencies between Domain and Infrastructure layers. Furthermore, we identified a phenomenon of "Implementation Laziness," where open-weights models generated 60% fewer Logical Lines of Code (LLOC) than their proprietary counterparts, effectively omitting complex business logic to satisfy token constraints. These findings suggest that without automated architectural linting, utilizing smaller open-weights models for system scaffolding accelerates the accumulation of structural technical debt.

cs / cs.SE / cs.AI

Arxivで見る