allenact.algorithms.onpolicy_sync.losses.ppo
#
Defining the PPO loss for actor critic type models.
PPO
#
class PPO(AbstractActorCriticLoss)
Implementation of the Proximal Policy Optimization loss.
Attributes
clip_param
: The clipping parameter to use.value_loss_coef
: Weight of the value loss.entropy_coef
: Weight of the entropy (encouraging) loss.use_clipped_value_loss
: Whether or not to also clip the value loss.clip_decay
: Callable for clip param decay factor (function of the current number of steps)entropy_method_name
: Name of Distr's entropy method name. Default isentropy
, but we might useconditional_entropy
forSequentialDistr
show_ratios
: If True, adds tracking for the PPO ratio (linear, clamped, and used) in each epoch to be logged by the engine.normalize_advantage
: Whether or not to use normalized advantage. Default is True.
PPO.__init__
#
| __init__(clip_param: float, value_loss_coef: float, entropy_coef: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, entropy_method_name: str = "entropy", normalize_advantage: bool = True, show_ratios: bool = False, *args, **kwargs)
Initializer.
See the class documentation for parameter definitions.
PPOValue
#
class PPOValue(AbstractActorCriticLoss)
Implementation of the Proximal Policy Optimization loss.
Attributes
clip_param
: The clipping parameter to use.use_clipped_value_loss
: Whether or not to also clip the value loss.
PPOValue.__init__
#
| __init__(clip_param: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, *args, **kwargs)
Initializer.
See the class documentation for parameter definitions.