PPO
State, Policy, Training
Proximal Policy Optimization.
Policy is a state-action mapping.
State = represents the state of the world by AI.
Action = what action to take in that state.
Policy = maps states to actions.
Basic problem in AI is how to maximize the reward over time at some task.
- one strategy is to understand the systema and predict the results of actions and corresponding rewards.
- another strategy is to have AI try a lot of things and record the results.
Either approach leads to a calculated policy.
Once a policy is calculated, heavy computation is done.
- agent just looks up what action to take in each state.