PPO

State, Policy, Training

Proximal Policy Optimization.

Policy is a state-action mapping.

State = represents the state of the world by AI.

Action = what action to take in that state.

Policy = maps states to actions.

Basic problem in AI is how to maximize the reward over time at some task.

  • one strategy is to understand the systema and predict the results of actions and corresponding rewards.
  • another strategy is to have AI try a lot of things and record the results.

Either approach leads to a calculated policy.

Once a policy is calculated, heavy computation is done.

  • agent just looks up what action to take in each state.