LLMもアンカー効果に引っかかるってマジ！？

Published：2025/11/7 23:35:19

LLMもアンカー効果に引っかかるってマジ！？ 🧐✨

超要約: LLM（AI）も人間みたいに、最初に見た数字に判断が左右されちゃうんだって！それを詳しく調べて、ビジネスにも活かそうって話💖
ギャル的キラキラポイント✨
- ● LLMの頭の中、解明しちゃうぞ！アンカー効果が、ただの真似っこじゃないってのがスゴイ！
- ● Shapley value (シャプレイ値) ってので、LLMの判断にアンカーがどれだけ影響してるか、数値化しちゃう！
- ● AIチャットボットとかの情報検索で、ユーザーが変な方向に誘導されないように対策できるってこと！
詳細解説
- 背景: 最近のAI (LLM) はすごいけど、人間と同じように変なクセ（認知バイアス）があるってことが問題になってるの！最初に見せられた数字（アンカー）で、その後の判断が左右されちゃう「アンカー効果」もその一つ😳
- 方法: LLMの頭の中をのぞき見👀 ログ確率ベースの行動分析とか、Shapley value (シャプレイ値) っていうスゴイので、アンカー効果がどうやって生まれるか、どれくらい影響があるかを調べたんだって！
- 結果: LLMの判断が、アンカーにめっちゃ左右されることが分かったみたい！しかも、ただの真似じゃなくて、LLMの内部でちゃんと確率的に変化してるらしい！これは、LLMをちゃんと理解して対策しないとヤバいってこと🥺
- 意義: AIの信頼性を上げるために、アンカー効果を考慮したサービス設計とか、対策技術の開発につながるってこと！ユーザーの判断を助けて、より良いサービスを作れるようになるんだね！
リアルでの使いみちアイデア💡
- AIチャットボットで、最初に高い値段見せつけて、高く買わせようとするのとか防げるようになるかも！ 😡
- 検索結果を、特定の方向に誘導されないように、公平な情報が出せるようになる！ 🥰
もっと深掘りしたい子へ🔍
- アンカー効果 (アンカーこうか)
- Shapley value (シャプレイち)
- LLM (エルエルエム)

続きは「らくらく論文」アプリで

Anchors in the Machine: Behavioral and Attributional Evidence of Anchoring Bias in LLMs

Felipe Valencia-Clavijo

Large language models (LLMs) are increasingly examined as both behavioral subjects and decision systems, yet it remains unclear whether observed cognitive biases reflect surface imitation or deeper probability shifts. Anchoring bias, a classic human judgment bias, offers a critical test case. While prior work shows LLMs exhibit anchoring, most evidence relies on surface-level outputs, leaving internal mechanisms and attributional contributions unexplored. This paper advances the study of anchoring in LLMs through three contributions: (1) a log-probability-based behavioral analysis showing that anchors shift entire output distributions, with controls for training-data contamination; (2) exact Shapley-value attribution over structured prompt fields to quantify anchor influence on model log-probabilities; and (3) a unified Anchoring Bias Sensitivity Score integrating behavioral and attributional evidence across six open-source models. Results reveal robust anchoring effects in Gemma-2B, Phi-2, and Llama-2-7B, with attribution signaling that the anchors influence reweighting. Smaller models such as GPT-2, Falcon-RW-1B, and GPT-Neo-125M show variability, suggesting scale may modulate sensitivity. Attributional effects, however, vary across prompt designs, underscoring fragility in treating LLMs as human substitutes. The findings demonstrate that anchoring bias in LLMs is robust, measurable, and interpretable, while highlighting risks in applied domains. More broadly, the framework bridges behavioral science, LLM safety, and interpretability, offering a reproducible path for evaluating other cognitive biases in LLMs.

cs / cs.AI / cs.CL / econ.GN / q-fin.EC

Arxivで見る