A policy function describes the confidence and preference of an agent when taking an action given the current state . At any state , an agent following the policy will take the action with the highest value. Note that the policy function is time-independent.
Effects of following a policy
- behaves like a Markov chain.
- is a Markov reward process.