The policy-indexed action-value function is the expected total future reward an agent should receive on if it takes action at state and then follows the policy function til the end (excludes ) in an MDP. This reflects whether an action has long-term benefits in a given state.