An MDP must have an optimal policy that is either as good or better than all other policies (in terms of the rewards the agent receives). All optimal policies achieve the optimal state-value function and optimal action-value function (the latter of which can be used to produce the optimal policy). There is always a deterministic optimal policy for any MDP.

A policy is more optimal than () if .