Security Memo

Search (Ctrl+K)

Recent Notes

SMART
Oct 22, 2024
- software
Bossa Nova
Oct 22, 2024
ZFS
Oct 22, 2024
post-rock
Oct 06, 2024
- concept
2024-09-27
Sep 27, 2024
- daily

See 1423 more sorted by tag →

Home

❯

policy-indexed state-value function

created Jan 08, 2024updated May 19, 20241 min read

concept

$v_{π} (s) = E_{π} [G_{t} ∣ S_{t} = s]$ The policy-indexed state-value function $v_{π} (s)$ is the expected total future reward $G_{t}$ (excludes $R_{t}$ ) starting at state $s$ if the agent follows the policy $π$ at $s$ til the end in an MDP. This reflects whether or not transitioning to state $s$ is rewarding in the long-term.

Graph View

Sources

Reinforcement learning

Backlinks

No backlinks found

GitHub
LinkedIn
HackTheBox

Security Memo

Recent Notes

SMART

Bossa Nova

ZFS

post-rock

2024-09-27

policy-indexed state-value function

Graph View

Sources

Backlinks