The discount factor describes how nearsighted an agent is in an MDP. The lower the discount factor, the less significance agent assigns to rewards in the distant future.
More specifically:
- means the agent is extremely shortsighted.
- means the agent is extremely farsighted (this is only possible if all state sequences properly terminate).
Why do we need a discount factor?
- Having a discount factor avoids infinite future rewards in cyclic processes.
- It is mathematically convenient to discount future rewards.
- When the reward is financial, an agent that aims for immediate rewards has the opportunity to earn more interest.