超要約: 複数の目標を同時に叶えるアルゴリズム🚀 新規事業で大活躍の予感🎵
✨ ギャル的キラキラポイント ✨
● 優先順位を考慮して、全部いい感じに最適化できるって最高じゃん?✨ ● A/Bテストも、少ない回数で理想の結果出せちゃうってマジ!?😳 ● 新規ビジネス、爆誕の予感でワクワクが止まらない💖
詳細解説
続きは「らくらく論文」アプリで
In multi-objective decision-making with hierarchical preferences, lexicographic bandits provide a natural framework for optimizing multiple objectives in a prioritized order. In this setting, a learner repeatedly selects arms and observes reward vectors, aiming to maximize the reward for the highest-priority objective, then the next, and so on. While previous studies have primarily focused on regret minimization, this work bridges the gap between \textit{regret minimization} and \textit{best arm identification} under lexicographic preferences. We propose two elimination-based algorithms to address this joint objective. The first algorithm eliminates suboptimal arms sequentially, layer by layer, in accordance with the objective priorities, and achieves sample complexity and regret bounds comparable to those of the best single-objective algorithms. The second algorithm simultaneously leverages reward information from all objectives in each round, effectively exploiting cross-objective dependencies. Remarkably, it outperforms the known lower bound for the single-objective bandit problem, highlighting the benefit of cross-objective information sharing in the multi-objective setting. Empirical results further validate their superior performance over baselines.