Security Memo

Search (Ctrl+K)

Recent Notes

SMART
Oct 22, 2024
- software
Bossa Nova
Oct 22, 2024
ZFS
Oct 22, 2024
post-rock
Oct 06, 2024
- concept
2024-09-27
Sep 27, 2024
- daily

See 1423 more sorted by tag →

Home

❯

policy-indexed action-value function

created Jan 08, 2024updated May 19, 20241 min read

concept

q_{π} (s, a) = E_{π} [G_{t} ∣ S_{t} = s, A_{t} = a]

The policy-indexed action-value function $q_{π} (s, a)$ is the expected total future reward $G_{t}$ an agent should receive on if it takes action $a$ at state $s$ and then follows the policy function $π$ til the end (excludes $R_{t}$ ) in an MDP. This reflects whether an action has long-term benefits in a given state.

Graph View

Sources

Reinforcement learning

Backlinks

Bellman expectation equation for MDP value functions

GitHub
LinkedIn
HackTheBox

Security Memo

Recent Notes

SMART

Bossa Nova

ZFS

post-rock

2024-09-27

policy-indexed action-value function

Graph View

Sources

Backlinks