I. 研究の概要 LLM(大規模言語モデル)で作ったコードの質を調査!技術的負債(後々困ること)を可視化✨
II. 研究の詳細
ギャル的キラキラポイント✨ その1: LLMが作ったコードの「アーキテクチャ違反」を発見🔎 独自の指標「幻覚的結合(Hallucinated Coupling)」を定義!
ギャル的キラキラポイント✨ その2: GPT-5.1とか、Llama 3とかのモデルでHexagonal Architecture(六角形アーキテクチャ)ってやつを試したらしい!アーキテクチャの質を比較したんだって!
ギャル的キラキラポイント✨ その3: オープンソースLLM(Llama 3)は、商用のLLM(GPT-5.1とか)よりアーキテクチャの質が低いことを発見!ヤバくない?😱
詳細解説
続きは「らくらく論文」アプリで
As Large Language Models (LLMs) transition from code completion tools to autonomous system architects, their impact on long-term software maintainability remains unquantified. While existing research benchmarks functional correctness (pass@k), this study presents the first empirical framework to measure "Architectural Erosion" and the accumulation of Technical Debt in AI-synthesized microservices. We conducted a comparative pilot study of three state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet, and Llama 3 8B) by prompting them to implement a standardized Book Lending Microservice under strict Hexagonal Architecture constraints. Utilizing Abstract Syntax Tree (AST) parsing, we find that while proprietary models achieve high architectural conformance (0% violation rate for GPT-5.1), open-weights models exhibit critical divergence. Specifically, Llama 3 demonstrated an 80% Architectural Violation Rate, frequently bypassing interface adapters to create illegal circular dependencies between Domain and Infrastructure layers. Furthermore, we identified a phenomenon of "Implementation Laziness," where open-weights models generated 60% fewer Logical Lines of Code (LLOC) than their proprietary counterparts, effectively omitting complex business logic to satisfy token constraints. These findings suggest that without automated architectural linting, utilizing smaller open-weights models for system scaffolding accelerates the accumulation of structural technical debt.