EZ選好×深層強化学習でヘッジ戦略💄✨

Published：2025/12/17 8:16:29

EZ選好×深層強化学習でヘッジ戦略💄✨

超要約: EZ選好の投資家が、深層強化学習で長期リスクをどうヘッジするか解明💡

ギャル的キラキラポイント✨

● EZ選好（ちょっと複雑な投資家の好み）をAIで分析するの、エモくない？🥺 ● 長期リスク（将来のお金の不安）に、AIで立ち向かうって、未来感ヤバい💖 ● IT業界がもっと進化するかも！って考えるとワクワクするよねー！🥰

詳細解説

続きは「らくらく論文」アプリで

Intertemporal Hedging Demand under Epstein-Zin Preferences in a Multi-Asset Long-Run Risk Model: Evidence from Projected Pontryagin-Guided Deep Policy Optimization

Wonchan Cho

I study intertemporal hedging demand in a continuous-time multi-asset long-run risk (LRR) model under Epstein--Zin (EZ) recursive preferences. The investor trades a risk-free asset and several risky assets whose drifts and volatilities depend on an Ornstein--Uhlenbeck type LRR factor. Preferences are described by EZ utility with risk aversion $R$, elasticity of intertemporal substitution $\psi$, and discount rate $\delta$, so that the standard time-additive CRRA case appears as a limiting benchmark. To handle the high-dimensional consumption--investment problem, I use a projected Pontryagin-guided deep policy optimization (P-PGDPO) scheme adapted to EZ preferences. The method starts from the continuous-time Hamiltonian implied by the Pontryagin maximum principle, represents the value and costate processes with neural networks, and updates the policy along the Hamiltonian gradient. Portfolio constraints and a lower bound on wealth are enforced by explicit projection operators rather than by adding ad hoc penalties. Three main findings emerge from numerical experiments in a five-asset LRR economy: \textbf{(1)} the P-PGDPO algorithm achieves stable convergence across multiple random seeds, validating its reliability for solving high-dimensional EZ problems; \textbf{(2)} wealth floors materially reduce hedging demand by limiting the investor's ability to exploit intertemporal risk-return tradeoffs; and \textbf{(3)} the learned hedging portfolios concentrate exposure in assets with high correlation to the LRR factor, confirming that EZ agents actively hedge long-run uncertainty rather than merely following myopic rules. Because EZ preferences nest time-additive CRRA in the limit $\psi \to 1/R$, I use CRRA as an explicit diagnostic benchmark and, when needed, a warm start to stabilize training in high dimensions.

cs / eess.SY / cs.SY / math.OC

Arxivで見る