Skip to content

allenact.algorithms.onpolicy_sync.losses.ppo#

[view_source]

Defining the PPO loss for actor critic type models.

PPO#

class PPO(AbstractActorCriticLoss)

[view_source]

Implementation of the Proximal Policy Optimization loss.

Attributes

  • clip_param: The clipping parameter to use.
  • value_loss_coef: Weight of the value loss.
  • entropy_coef: Weight of the entropy (encouraging) loss.
  • use_clipped_value_loss: Whether or not to also clip the value loss.
  • clip_decay: Callable for clip param decay factor (function of the current number of steps)
  • entropy_method_name: Name of Distr's entropy method name. Default is entropy, but we might use conditional_entropy for SequentialDistr
  • show_ratios: If True, adds tracking for the PPO ratio (linear, clamped, and used) in each epoch to be logged by the engine.
  • normalize_advantage: Whether or not to use normalized advantage. Default is True.

PPO.__init__#

 | __init__(clip_param: float, value_loss_coef: float, entropy_coef: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, entropy_method_name: str = "entropy", normalize_advantage: bool = True, show_ratios: bool = False, *args, **kwargs)

[view_source]

Initializer.

See the class documentation for parameter definitions.

PPOValue#

class PPOValue(AbstractActorCriticLoss)

[view_source]

Implementation of the Proximal Policy Optimization loss.

Attributes

  • clip_param: The clipping parameter to use.
  • use_clipped_value_loss: Whether or not to also clip the value loss.

PPOValue.__init__#

 | __init__(clip_param: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, *args, **kwargs)

[view_source]

Initializer.

See the class documentation for parameter definitions.