The discount factor describes how nearsighted an agent is in an MDP. The lower the discount factor, the less significance agent assigns to rewards in the distant future.

More specifically:

  • means the agent is extremely shortsighted.
  • means the agent is extremely farsighted (this is only possible if all state sequences properly terminate).

Why do we need a discount factor?

  • Having a discount factor avoids infinite future rewards in cyclic processes.
  • It is mathematically convenient to discount future rewards.
  • When the reward is financial, an agent that aims for immediate rewards has the opportunity to earn more interest.