Skip to content

core.algorithms.onpolicy_sync.policy#

[view_source]

ActorCriticModel#

class ActorCriticModel(Generic[DistributionType],  nn.Module)

[view_source]

Abstract class defining a deep (recurrent) actor critic agent.

When defining a new agent, you should over subclass this class and implement the abstract methods.

Attributes

  • action_space: The space of actions available to the agent. Currently only discrete actions are allowed (so this space will always be of type gym.spaces.Discrete).
  • observation_space: The observation space expected by the agent. This is of type gym.spaces.dict.

ActorCriticModel.__init__#

 | __init__(action_space: gym.spaces.Discrete, observation_space: SpaceDict)

[view_source]

Initializer.

Parameters

  • action_space : The space of actions available to the agent.
  • observation_space: The observation space expected by the agent.

ActorCriticModel.recurrent_memory_specification#

 | @property
 | recurrent_memory_specification() -> Optional[FullMemorySpecType]

[view_source]

The memory specification for the ActorCriticModel. See docs for _recurrent_memory_shape

Returns

The memory specification from _recurrent_memory_shape.

ActorCriticModel.forward#

 | @abc.abstractmethod
 | forward(observations: ObservationType, memory: Memory, prev_actions: torch.Tensor, masks: torch.FloatTensor) -> Tuple[ActorCriticOutput[DistributionType], Optional[Memory]]

[view_source]

Transforms input observations (& previous hidden state) into action probabilities and the state value.

Parameters

  • observations : Multi-level map from key strings to tensors of shape [steps, samplers, (agents,) ...] with the current observations.
  • memory : Memory object with recurrent memory. The shape of each tensor is determined by the corresponding entry in _recurrent_memory_specification.
  • prev_actions : tensor of shape [steps, samplers, agents, ...] with the previous actions.
  • masks : tensor of shape [steps, samplers, agents, 1] with zeros indicating steps where a new episode/task starts.

Returns

A tuple whose first element is an object of class ActorCriticOutput which stores the agent's probability distribution over possible actions (shape [steps, samplers, agents, num_actions]), the agent's value for the state (shape [steps, samplers, agents, 1]), and any extra information needed for loss computations. The second element is an optional Memory, which is only used in models with recurrent memory.