reward function

In an MDP, the reward function $R$ describes the reward $R$ an agent is expected to get when it reaches some state $s$ by taking an action $a$ .

More specifically, $R_{s}^{a} = E [R_{t + 1} ∣ S_{t} = s, A_{t} = a]$ is the expected value of reward in the next state given the current state and action. In other words, the reward function depends on the expected reward for taking an action.

In contrast, in an MRP, the reward function is defined as $R_{s} = E [R_{t = 1} ∣ S_{t} = s]$ . Note the lack of action.

Security Memo

Recent Notes

SMART

Bossa Nova

ZFS

post-rock

2024-09-27

reward function

Graph View

Sources

Backlinks