最強ICS爆誕！LLMで顧客対応レベルMAX💖

Published：2026/1/8 9:45:39

最強ICS爆誕！LLMで顧客対応レベルMAX💖

超要約: LLM（大規模言語モデル）で、もっと賢く、安全で、顧客満足度爆上げのカスタマーサービス作るよ！

ギャル的キラキラポイント✨

● 既存のカスタマーサービス、マジで課題多すぎ！😭 顧客満足度とか、リスク対応が全然できてなかったのを、LLMで解決しちゃう！ ● 新しい評価基準「OLABENCH」で、多角的にサービスを評価💖現実世界で使えるレベルを目指すよ！ ● モデルを最強にする「OLAMIND」って技術も開発！✨どんどん賢くなるICS、楽しみじゃん？

詳細解説

続きは「らくらく論文」アプリで

Benchmarking and Learning Real-World Customer Service Dialogue

Tianhong Gao / Jundong Shen / Jiapeng Wang / Bei Shi / Ying Ju / Junfeng Yao / Huiyu Yu

Existing benchmarks and training pipelines for industrial intelligent customer service (ICS) remain misaligned with real-world dialogue requirements, overemphasizing verifiable task success while under-measuring subjective service quality and realistic failure modes, leaving a gap between offline gains and deployable dialogue behavior. We close this gap with a benchmark-to-optimization loop: we first introduce OlaBench, an ICS benchmark spanning retrieval-augmented generation, workflow-based systems, and agentic settings, which evaluates service capability, safety, and latency sensitivity; moreover, motivated by OlaBench results showing state-of-the-art LLMs still fall short, we propose OlaMind, which distills reusable reasoning patterns and service strategies from expert dialogues and applies rubric-aware staged exploration--exploitation reinforcement learning to improve model capability. OlaMind surpasses GPT-5.2 and Gemini 3 Pro on OlaBench (78.72 vs. 70.58/70.84) and, in online A/B tests, delivers an average +23.67% issue resolution and -6.6% human transfer rate versus the baseline, bridging offline gains to deployment. Together, OlaBench and OlaMind advance ICS systems toward more anthropomorphic, professional, and reliable deployment.

cs / cs.CL

Arxivで見る