allenact.algorithms.onpolicy_sync.losses.ppo
#
Defining the PPO loss for actor critic type models.
PPO
#
class PPO(AbstractActorCriticLoss)
Implementation of the Proximal Policy Optimization loss.
Attributes
clip_param
: The clipping parameter to use.value_loss_coef
: Weight of the value loss.entropy_coef
: Weight of the entropy (encouraging) loss.use_clipped_value_loss
: Whether or not to also clip the value loss.
PPO.__init__
#
| __init__(clip_param: float, value_loss_coef: float, entropy_coef: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, *args, **kwargs)
Initializer.
See the class documentation for parameter definitions.
PPOValue
#
class PPOValue(AbstractActorCriticLoss)
Implementation of the Proximal Policy Optimization loss.
Attributes
clip_param
: The clipping parameter to use.use_clipped_value_loss
: Whether or not to also clip the value loss.
PPOValue.__init__
#
| __init__(clip_param: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, *args, **kwargs)
Initializer.
See the class documentation for parameter definitions.