コードミックス（言語Mix）モデル、合体で進化🚀

Published：2025/10/23 10:53:54

コードミックス（言語Mix）モデル、合体で進化🚀

超要約: コードミックス言語を、モデルマージング（合体）で賢く対応✨
ギャル的キラキラポイント:
- ● コードミックスって、色んな言語が混ざったやつのこと💖
- ● モデルを合体させたら、少ないデータで賢くなれる！
- ● グローバル（世界）展開するサービスに超役立つじゃん？
詳細解説:
- 背景: いろんな言語がごちゃ混ぜの「コードミックス」って、AIにとっては難題💦 でも、世界中で使われてるから、対応は必須なの！
- 方法: 既存（きぞん）のモデルを合体！ファインチューニング（モデルを細かく調整）よりも、データとコストを抑えられるらしい♪
- 結果: コードミックス言語も、ちゃんと理解できるようになったってこと！既存の能力もキープできるのがすごい✨
- 意義（ここがヤバい♡ポイント）: 世界中の人とコミュニケーションできる、夢のツール開発に繋がるかも😍
リアルでの使いみちアイデア:
- 💡 多言語チャットボットで、世界中の友達とスムーズにおしゃべり！
- 💡 SNS分析で、色んな国の流行（はやり）をキャッチ！

続きは「らくらく論文」アプリで

Adapting Multilingual Models to Code-Mixed Tasks via Model Merging

Prashant Kodali / Vaishnavi Shivkumar / Swarang Joshi / Monojit Choudhary / Ponnurangam Kumaraguru / Manish Shrivastava

We study model merging as a practical alternative to conventional adaptation strategies for code-mixed NLP. Starting from a multilingual base model, we: (i) perform continued pre-training (CPT) on unlabeled code-mixed text to obtain an adapted checkpoint, (ii) merge checkpoint with the base model, and (iii) fine-tune (FT) on the downstream task data. We evaluate our approach for sentence classification (sentiment and hate speech) task in English-Hindi (En-Hi) and English-Spanish (En-Es) using XLM-R and Llama-3.2-1B models. Our results show that merged models consistently outperform full fine-tuning and CPT->FT. We observe gains of 2--5 points in F1 over full fine-tuning and ~1-2 points over CPT->FT, indicating that unlabeled data is leveraged more effectively via merging than via CPT alone. Zero-/few-shot prompting with larger LLMs (e.g., Llama-3.3-70B) lags behind fine-tuned and merged checkpoints, underscoring limits of in-context learning for code-mixed inputs. We further test cross-pair transfer by training on En-Hi and evaluating on En-Ta and En-Ml: merged checkpoints transfer more strongly than monolingual-English baselines (e.g., TV/TIES variants reaching 0.65-0.68 F1 vs 0.61-0.63 for full fine-tuning), suggesting that code-mixed knowledge is a more reliable substrate for low-resource pairs. We conclude with adaptation recipes matched to common data regimes (labeled only; labeled+unlabeled; transfer-only) and discuss limitations and scaling considerations for broader tasks and larger models.

cs / cs.CL

Arxivで見る