Skip to content




class ActiveNeuralSLAM(nn.Module)


Active Neural SLAM module.

This is an implementation of the Active Neural SLAM module from:

Chaplot, D.S., Gandhi, D., Gupta, S., Gupta, A. and Salakhutdinov, R., 2020.
Learning To Explore Using Active Neural SLAM.
In International Conference on Learning Representations (ICLR).

Note that this is purely the mapping component and does not include the planning components from the above paper.

This implementation is adapted from, we have extended this implementation to allow for an arbitrary number of output map channels (enabling semantic mapping).

At a high level, this model takes as input RGB egocentric images and outputs metric map tensors of shape (# channels) x height x width where height/width correspond to the ground plane of the environment.


 | __init__(frame_height: int, frame_width: int, n_map_channels: int, resolution_in_cm: int = 5, map_size_in_cm: int = 2400, vision_range_in_cm: int = 300, use_pose_estimation: bool = False, pretrained_resnet: bool = True, freeze_resnet_batchnorm: bool = True, use_resnet_layernorm: bool = False)


Initialize an Active Neural SLAM module.


  • frame_height : The height of the RGB images given to this module on calls to forward.
  • frame_width : The width of the RGB images given to this module on calls to forward.
  • n_map_channels : The number of output channels in the output maps.
  • resolution_in_cm : The resolution of the output map, see map_size_in_cm.
  • map_size_in_cm : The height & width of the map in centimeters. The size of the map tensor returned on calls to forward will be map_size_in_cm/resolution_in_cm. Note that map_size_in_cm must be an divisible by resolution_in_cm.
  • vision_range_in_cm : Given an RGB image input, this module will transform this image into an "egocentric map" with height and width equaling vision_range_in_cm/resolution_in_cm. This egocentr map corresponds to the area of the world directly in front of the agent. This "egocentric map" will be rotated/translated into the allocentric reference frame and used to update the larger, allocentric, map whose height and width equal map_size_in_cm/resolution_in_cm. Thus this parameter controls how much of the map will be updated on every step.
  • use_pose_estimation : Whether or not we should estimate the agent's change in position/rotation. If False, you'll need to provide the ground truth changes in position/rotation.
  • pretrained_resnet : Whether or not to use ImageNet pre-trained model weights for the ResNet18 backbone.
  • freeze_resnet_batchnorm : Whether or not the batch normalization layers in the ResNet18 backbone should be frozen and batchnorm updates disabled. You almost certainly want this to be True as using batch normalization during RL training results in all sorts of issues unless you're very careful.
  • use_resnet_layernorm : If you've enabled freeze_resnet_batchnorm (recommended) you'll likely want to normalize the output from the ResNet18 model as we've found that these values can otherwise grow quite large harming learning.


 | forward(images: Optional[torch.Tensor], last_map_probs_allocentric: Optional[torch.Tensor], last_xzrs_allocentric: Optional[torch.Tensor], dx_dz_drs_egocentric: Optional[torch.Tensor], last_map_logits_egocentric: Optional[torch.Tensor], return_allocentric_maps=True, resnet_image_features: Optional[torch.Tensor] = None) -> Dict[str, Any]


Create allocentric/egocentric maps predictions given RGB image inputs.

Here it is assumed that last_xzrs_allocentric has been re-centered so that (x, z) == (0,0) corresponds to the top left of the returned map (with increasing x/z moving to the bottom right of the map).

Note that all maps are oriented so that: * Increasing x values correspond to increasing columns in the map(s). * Increasing z values correspond to increasing rows in the map(s). Note that this may seem a bit weird as: * "north" is pointing downwards in the map, * if you picture yourself as the agent facing north (i.e. down) in the map, then moving to the right from the agent's perspective will correspond to increasing which column the agent is at:

agent facing downwards - - > (dir. to the right of the agent, i.e. moving right corresponds to +cols)
    v (dir. agent faces, i.e. moving ahead corresponds to +rows)
This may be the opposite of what you expect.


  • images : A (# batches) x 3 x height x width tensor of RGB images. These should be normalized for use with a resnet model. See here for information (see also the use_resnet_normalization parameter of the allenact.base_abstractions.sensor.RGBSensor sensor).
  • last_map_probs_allocentric : A (# batches) x (map channels) x (map height) x (map width) tensor representing the colllection of allocentric maps to be updated.
  • last_xzrs_allocentric : A (# batches) x 3 tensor where last_xzrs_allocentric[:, 0] are the agent's (allocentric) x-coordinates on the previous step, last_xzrs_allocentric[:, 1] are the agent's (allocentric) z-coordinates from the previous step, and last_xzrs_allocentric[:, 2] are the agent's rotations (allocentric, in degrees) from the prevoius step.
  • dx_dz_drs_egocentric : A (# batches) x 3 tensor representing the agent's change in x (in meters), z (in meters), and rotation (in degrees) from the previous step. Note that these changes are "egocentric" so that if the agent moved 1 meter ahead from it's perspective this should correspond to a dz of +1.0 regardless of the agent's orientation (similarly moving right would result in a dx of +1.0). This is ignored (and thus can be None) if you are using pose estimation (i.e. self.use_pose_estimation is True) or if return_allocentric_maps is False.
  • last_map_logits_egocentric : The "egocentric_update" output when calling this function on the last agent's step. I.e. this should be the egocentric map view of the agent from the last step. This is used to compute the change in the agent's position rotation. This is ignored (and thus can be None) if you do not wish to estimate the agent's pose (i.e. self.use_pose_estimation is False).
  • return_allocentric_maps : Whether or not to generate new allocentric maps given last_map_probs_allocentric and the new map estimates. Creating these new allocentric maps is expensive so better avoided when not needed.
  • resnet_image_features : Sometimes you may wish to compute the ResNet image features yourself for use in another part of your model. Rather than having to recompute them multiple times, you can instead compute them once and pass them into this forward call (in this case the input images parameter is ignored). Note that if you're using the self.resnet_l5 module to compute these features, be sure to also normalize them with self.resnet_normalizer if you have opted to use_resnet_layernorm when initializing this module).


A dictionary with keys/values: * "egocentric_update" - The egocentric map view for the given RGB image. This is what should be used for computing losses in general. * "map_logits_probs_update_no_grad" - The egocentric map view after it has been rotated, translated, and moved into a full-sized allocentric map. This map has been detached from the computation graph and so should not be used for gradient computations. This will be None if return_allocentric_maps was False. * "map_logits_probs_no_grad" - The newly updated allocentric map, this corresponds to performing a pointwise maximum between last_map_probs_allocentric and the above returned map_probs_allocentric_update_no_grad. This will be None if return_allocentric_maps was False. * "dx_dz_dr_egocentric_preds" - The predicted change in x, z, and rotation of the agent (from the egocentric perspective of the agent). * "xzr_allocentric_preds" - The (predicted if self.use_pose_estimation == True) allocentric (x, z) position and rotation of the agent. This will equal None if self.use_pose_estimation == False and dx_dz_drs_egocentric is None.