The optimal action-value function is the maximum action-value function over all policies. In other words, is the resulting action-value function when the agent follows the optimal policy (treat as a parameter in the equation below).
The optimal action-value function is the maximum action-value function over all policies. In other words, is the resulting action-value function when the agent follows the optimal policy (treat as a parameter in the equation below).