A policy function describes the confidence and preference of an agent when taking an action given the current state . At any state , an agent following the policy will take the action with the highest value. Note that the policy function is time-independent.

Effects of following a policy