Skip to content

core.algorithms.onpolicy_sync.losses.ppo#

[view_source]

Defining the PPO loss for actor critic type models.

PPO#

class PPO(AbstractActorCriticLoss)

[view_source]

Implementation of the Proximal Policy Optimization loss.

Attributes

  • clip_param: The clipping parameter to use.
  • value_loss_coef: Weight of the value loss.
  • entropy_coef: Weight of the entropy (encouraging) loss.
  • use_clipped_value_loss: Whether or not to also clip the value loss.

PPO.__init__#

 | __init__(clip_param: float, value_loss_coef: float, entropy_coef: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, *args, **kwargs)

[view_source]

Initializer.

See the class documentation for parameter definitions.

PPOValue#

class PPOValue(AbstractActorCriticLoss)

[view_source]

Implementation of the Proximal Policy Optimization loss.

Attributes

  • clip_param: The clipping parameter to use.
  • use_clipped_value_loss: Whether or not to also clip the value loss.

PPOValue.__init__#

 | __init__(clip_param: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, *args, **kwargs)

[view_source]

Initializer.

See the class documentation for parameter definitions.