Hugging Face PTM活用でSE爆速化！✨

Published：2025/12/3 14:18:58

Hugging Face PTM活用でSE爆速化！✨

超要約: Hugging FaceのモデルをSE向けに整理！開発効率UP狙うぞ💖
ギャル的キラキラポイント✨
- ● SE（ソフトウェアエンジニアリング）のお悩み解決！PTM（Pre-trained Models）を使いやすくする作戦💖
- ● 開発時間短縮、コスト削減、品質UPも夢じゃない！AIの力ってすごい～😍
- ● 新しいビジネスチャンスが生まれる予感！AIプラットフォームとか面白そうじゃん？😎
詳細解説
- 背景: Hugging Faceには、すっごい数のAIモデルがあるけど、SE向けには整理されてなかったの💦 だから、SEのみんなが使えるように、モデルを分類してカタログ化する研究なんだって！
- 方法: SEの仕事内容（タスク）と開発の流れ（SDLC）に合わせて、モデルを分類したんだって！API使って自動でモデルを集めたり、LLM（大規模言語モデル）でチェックしたり、色々工夫してるみたい✨
- 結果: SE向けのPTMカタログが完成！色んな情報が見れるから、自分に合ったモデルがすぐに見つかるようになるの💕 開発効率がめっちゃ上がりそうじゃん？
- 意義（ここがヤバい♡ポイント）: AI技術をSEに活かしまくって、開発効率を上げたり、新しいビジネスを生み出したりできる可能性があるってこと！IT業界がもっと楽しくなりそう😍
リアルでの使いみちアイデア💡
- AI使ったコードレビュー（プログラムのチェック）ツールを作って、みんなで楽して高品質なコード書けるようにしたい！
- テストケース（プログラムのテスト方法）を自動で作ってくれるツールで、テストの時間を短縮したいよね！

続きは「らくらく論文」アプリで

Cataloguing Hugging Face Models to Software Engineering Activities: Automation and Findings

Alexandra Gonz\'alez / Xavier Franch / David Lo / Silverio Mart\'inez-Fern\'andez

Context: Open-source Pre-Trained Models (PTMs) provide extensive resources for various Machine Learning (ML) tasks, yet these resources lack a classification tailored to Software Engineering (SE) needs to support the reliable identification and reuse of models for SE. Objective: To address this gap, we derive a taxonomy encompassing 147 SE tasks and apply an SE-oriented classification to PTMs in a popular open-source ML repository, Hugging Face (HF). Method: Our repository mining study followed a five-phase pipeline: (i) identification SE tasks from the literature; (ii) collection of PTM data from the HF API, including model card descriptions and metadata, and the abstracts of the associated arXiv papers; (iii) text processing to ensure consistency; (iv) a two-phase validation of SE relevance, involving humans and LLM assistance, supported by five pilot studies with human annotators and a generalization test; (v) and data analysis. This process yielded a curated catalogue of 2,205 SE PTMs. Results: We find that most SE PTMs target code generation and coding, emphasizing implementation over early or late development stages. In terms of ML tasks, text generation dominates within SE PTMs. Notably, the number of SE PTMs has increased markedly since 2023 Q2, while evaluation remains limited: only 9.6% report benchmark results, mostly scoring below 50%. Conclusions: Our catalogue reveals documentation and transparency gaps, highlights imbalances across SDLC phases, and provides a foundation for automated SE scenarios, such as the sampling and selection of suitable PTMs.

cs / cs.SE / cs.LG

Arxivで見る