Security Memo

Search (Ctrl+K)

Recent Notes

SMART
Oct 22, 2024
- software
Bossa Nova
Oct 22, 2024
ZFS
Oct 22, 2024
post-rock
Oct 06, 2024
- concept
2024-09-27
Sep 27, 2024
- daily

See 1423 more sorted by tag →

Home

❯

policy function

created Jan 08, 2024updated Apr 16, 20241 min read

concept

A policy function $π (a ∣ s) = P [A_{t = a} ∣ S_{t} = s]$ describes the confidence and preference of an agent when taking an action $a$ given the current state $s$ . At any state $s$ , an agent following the policy $π$ will take the action $a$ with the highest $π (a ∣ s)$ value. Note that the policy function is time-independent.

Effects of following a policy

$⟨ S, P^{π} ⟩$ behaves like a Markov chain.

$⟨ S, P^{π}, R^{π}, γ ⟩$ is a Markov reward process.

Graph View

Backlinks

Markov decision process
optimal action-value function
optimal state-value function
policy-indexed MDP
policy-indexed action-value function

GitHub
LinkedIn
HackTheBox

Security Memo

Recent Notes

SMART

Bossa Nova

ZFS

post-rock

2024-09-27

policy function

Graph View

Backlinks