The policy-indexed state-value function is the expected total future reward (excludes ) starting at state if the agent follows the policy at til the end in an MDP. This reflects whether or not transitioning to state is rewarding in the long-term.
The policy-indexed state-value function is the expected total future reward (excludes ) starting at state if the agent follows the policy at til the end in an MDP. This reflects whether or not transitioning to state is rewarding in the long-term.