AIモデルの評価方法が「アリーナ」みたいにユーザー参加型になって、ビジネスにもめっちゃ影響あるよ!
ギャル的キラキラポイント✨ ● AIモデルの評価が、ユーザーの好みで決まる時代になったってこと! ● 企業は、ユーザーの意見を聞きながら、AIを良くできるってこと! ● 新しいビジネスチャンスが生まれる予感しかないってこと!
背景 AI(人工知能)モデルの性能評価、昔は専門家がやってたんだけど、最近はユーザーが「このAI、どっちが良い?」って比較する「アリーナ」みたいな評価方法が流行ってるんだって!まるでゲームみたい🎮✨
方法 LMArenaみたいなプラットフォームで、複数のAIモデルをユーザーが実際に使って、どっちが良いか評価するんだって!まるでeスポーツみたいで面白そうじゃん?
続きは「らくらく論文」アプリで
Innovation in artificial intelligence (AI) has always been dependent on technological infrastructures, from code repositories to computing hardware. Yet industry - rather than universities - has become increasingly influential in shaping AI innovation. As generative forms of AI powered by large language models (LLMs) have driven the breakout of AI into the wider world, the AI community has sought to develop new methods for independently evaluating the performance of AI models. How best, in other words, to compare the performance of AI models against other AI models - and how best to account for new models launched on nearly a daily basis? Building on recent work in media studies, STS, and computer science on benchmarking and the practices of AI evaluation, I examine the rise of so-called 'arenas' in which AI models are evaluated with reference to gladiatorial-style 'battles'. Through a technography of a leading user-driven AI model evaluation platform, LMArena, I consider five themes central to the emerging 'arena-ization' of AI innovation. Accordingly, I argue that the arena-ization is being powered by a 'viral' desire to capture attention both in, and outside of, the AI community, critical to the scaling and commercialization of AI products. In the discussion, I reflect on the implications of 'arena gaming', a phenomenon through which model developers hope to capture attention.