State, Policy, Training

Proximal Policy Optimization.

Policy is a state-action mapping.

State = represents the state of the world by AI.

Action = what action to take in that state.

Policy = maps states to actions.

Basic problem in AI is how to maximize the reward over time at some task.

one strategy is to understand the systema and predict the results of actions and corresponding rewards.
another strategy is to have AI try a lot of things and record the results.

Either approach leads to a calculated policy.

Once a policy is calculated, heavy computation is done.

PPO