allenact.algorithms.onpolicy_sync.policy
#
ActorCriticModel
#
class ActorCriticModel(Generic[DistributionType], nn.Module)
Abstract class defining a deep (recurrent) actor critic agent.
When defining a new agent, you should subclass this class and implement the abstract methods.
Attributes
action_space
: The space of actions available to the agent. This is of typegym.spaces.Space
.observation_space
: The observation space expected by the agent. This is of typegym.spaces.dict
.
ActorCriticModel.__init__
#
| __init__(action_space: gym.Space, observation_space: SpaceDict)
Initializer.
Parameters
- action_space : The space of actions available to the agent.
- observation_space: The observation space expected by the agent.
ActorCriticModel.recurrent_memory_specification
#
| @property
| recurrent_memory_specification() -> Optional[FullMemorySpecType]
The memory specification for the ActorCriticModel
. See docs for
_recurrent_memory_shape
Returns
The memory specification from _recurrent_memory_shape
.
ActorCriticModel.forward
#
| @abc.abstractmethod
| forward(observations: ObservationType, memory: Memory, prev_actions: ActionType, masks: torch.FloatTensor) -> Tuple[ActorCriticOutput[DistributionType], Optional[Memory]]
Transforms input observations (& previous hidden state) into action probabilities and the state value.
Parameters
- observations : Multi-level map from key strings to tensors of shape [steps, samplers, (agents,) ...] with the current observations.
- memory :
Memory
object with recurrent memory. The shape of each tensor is determined by the corresponding entry in_recurrent_memory_specification
. - prev_actions : ActionType with tensors of shape [steps, samplers, ...] with the previous actions.
- masks : tensor of shape [steps, samplers, agents, 1] with zeros indicating steps where a new episode/task starts.
Returns
A tuple whose first element is an object of class ActorCriticOutput which stores
the agents' probability distribution over possible actions (shape [steps, samplers, ...]),
the agents' value for the state (shape [steps, samplers, ..., 1]), and any extra information needed for
loss computations. The second element is an optional Memory
, which is only used in models with recurrent
memory.