optimal policy

An MDP must have an optimal policy $π_{*}$ that is either as good or better than all other policies (in terms of the rewards the agent receives). All optimal policies achieve the optimal state-value function and optimal action-value function (the latter of which can be used to produce the optimal policy). There is always a deterministic optimal policy for any MDP.

π_{*} (a ∣ s) = ⎩ ⎨ ⎧ 10 if a = a \in A argmax q_{π} (s, a) otherwise

A policy $π$ is more optimal than $π^{'}$ ( $π \geq π^{'}$ ) if $v_{π} (s) \geq v_{π^{'}} (s), \forall s$ .

Security Memo

Recent Notes

SMART

Bossa Nova

ZFS

post-rock

2024-09-27

optimal policy

Graph View

Sources

Backlinks