allenact.embodiedai.mapping.mapping_models.active_neural_slam
#
ActiveNeuralSLAM
#
class ActiveNeuralSLAM(nn.Module)
Active Neural SLAM module.
This is an implementation of the Active Neural SLAM module from:
Chaplot, D.S., Gandhi, D., Gupta, S., Gupta, A. and Salakhutdinov, R., 2020.
Learning To Explore Using Active Neural SLAM.
In International Conference on Learning Representations (ICLR).
Note that this is purely the mapping component and does not include the planning components from the above paper.
This implementation is adapted from https://github.com/devendrachaplot/Neural-SLAM
,
we have extended this implementation to allow for an arbitrary number of output map
channels (enabling semantic mapping).
At a high level, this model takes as input RGB egocentric images and outputs metric map tensors of shape (# channels) x height x width where height/width correspond to the ground plane of the environment.
ActiveNeuralSLAM.__init__
#
| __init__(frame_height: int, frame_width: int, n_map_channels: int, resolution_in_cm: int = 5, map_size_in_cm: int = 2400, vision_range_in_cm: int = 300, use_pose_estimation: bool = False, pretrained_resnet: bool = True, freeze_resnet_batchnorm: bool = True, use_resnet_layernorm: bool = False)
Initialize an Active Neural SLAM module.
Parameters
- frame_height : The height of the RGB images given to this module on calls to
forward
. - frame_width : The width of the RGB images given to this module on calls to
forward
. - n_map_channels : The number of output channels in the output maps.
- resolution_in_cm : The resolution of the output map, see
map_size_in_cm
. - map_size_in_cm : The height & width of the map in centimeters. The size of the map
tensor returned on calls to forward will be
map_size_in_cm/resolution_in_cm
. Note thatmap_size_in_cm
must be an divisible by resolution_in_cm. - vision_range_in_cm : Given an RGB image input, this module will transform this image into
an "egocentric map" with height and width equaling
vision_range_in_cm/resolution_in_cm
. This egocentr map corresponds to the area of the world directly in front of the agent. This "egocentric map" will be rotated/translated into the allocentric reference frame and used to update the larger, allocentric, map whose height and width equalmap_size_in_cm/resolution_in_cm
. Thus this parameter controls how much of the map will be updated on every step. - use_pose_estimation : Whether or not we should estimate the agent's change in position/rotation.
If
False
, you'll need to provide the ground truth changes in position/rotation. - pretrained_resnet : Whether or not to use ImageNet pre-trained model weights for the ResNet18 backbone.
- freeze_resnet_batchnorm : Whether or not the batch normalization layers in the ResNet18 backbone
should be frozen and batchnorm updates disabled. You almost certainly want this to be
True
as using batch normalization during RL training results in all sorts of issues unless you're very careful. - use_resnet_layernorm : If you've enabled
freeze_resnet_batchnorm
(recommended) you'll likely want to normalize the output from the ResNet18 model as we've found that these values can otherwise grow quite large harming learning.
ActiveNeuralSLAM.forward
#
| forward(images: Optional[torch.Tensor], last_map_probs_allocentric: Optional[torch.Tensor], last_xzrs_allocentric: Optional[torch.Tensor], dx_dz_drs_egocentric: Optional[torch.Tensor], last_map_logits_egocentric: Optional[torch.Tensor], return_allocentric_maps=True, resnet_image_features: Optional[torch.Tensor] = None) -> Dict[str, Any]
Create allocentric/egocentric maps predictions given RGB image inputs.
Here it is assumed that last_xzrs_allocentric
has been re-centered so that (x, z) == (0,0)
corresponds to the top left of the returned map (with increasing x/z moving to the bottom right of the map).
Note that all maps are oriented so that: * Increasing x values correspond to increasing columns in the map(s). * Increasing z values correspond to increasing rows in the map(s). Note that this may seem a bit weird as: * "north" is pointing downwards in the map, * if you picture yourself as the agent facing north (i.e. down) in the map, then moving to the right from the agent's perspective will correspond to increasing which column the agent is at:
agent facing downwards - - > (dir. to the right of the agent, i.e. moving right corresponds to +cols)
|
|
v (dir. agent faces, i.e. moving ahead corresponds to +rows)
This may be the opposite of what you expect.
Parameters
- images : A (# batches) x 3 x height x width tensor of RGB images. These should be
normalized for use with a resnet model. See here
for information (see also the
use_resnet_normalization
parameter of theallenact.base_abstractions.sensor.RGBSensor
sensor). - last_map_probs_allocentric : A (# batches) x (map channels) x (map height) x (map width) tensor representing the colllection of allocentric maps to be updated.
- last_xzrs_allocentric : A (# batches) x 3 tensor where
last_xzrs_allocentric[:, 0]
are the agent's (allocentric) x-coordinates on the previous step,last_xzrs_allocentric[:, 1]
are the agent's (allocentric) z-coordinates from the previous step, andlast_xzrs_allocentric[:, 2]
are the agent's rotations (allocentric, in degrees) from the prevoius step. - dx_dz_drs_egocentric : A (# batches) x 3 tensor representing the agent's change in x (in meters), z (in meters),
and rotation (in degrees) from the previous step. Note that these changes are "egocentric" so that if the
agent moved 1 meter ahead from it's perspective this should correspond to a dz of +1.0 regardless of
the agent's orientation (similarly moving right would result in a dx of +1.0). This
is ignored (and thus can be
None
) if you are using pose estimation (i.e.self.use_pose_estimation
isTrue
) or ifreturn_allocentric_maps
isFalse
. - last_map_logits_egocentric : The "egocentric_update" output when calling this function
on the last agent's step. I.e. this should be the egocentric map view of the agent
from the last step. This is used to compute the change in the agent's position rotation.
This is ignored (and thus can be
None
) if you do not wish to estimate the agent's pose (i.e.self.use_pose_estimation
isFalse
). - return_allocentric_maps : Whether or not to generate new allocentric maps given
last_map_probs_allocentric
and the new map estimates. Creating these new allocentric maps is expensive so better avoided when not needed. - resnet_image_features : Sometimes you may wish to compute the ResNet image features yourself for use
in another part of your model. Rather than having to recompute them multiple times, you can
instead compute them once and pass them into this forward call (in this case the input
images
parameter is ignored). Note that if you're using theself.resnet_l5
module to compute these features, be sure to also normalize them withself.resnet_normalizer
if you have opted touse_resnet_layernorm
when initializing this module).
Returns
A dictionary with keys/values
:
* "egocentric_update" - The egocentric map view for the given RGB image. This is what should
be used for computing losses in general.
* "map_logits_probs_update_no_grad" - The egocentric map view after it has been
rotated, translated, and moved into a full-sized allocentric map. This map has been
detached from the computation graph and so should not be used for gradient computations.
This will be None
if return_allocentric_maps
was False
.
* "map_logits_probs_no_grad" - The newly updated allocentric map, this corresponds to
performing a pointwise maximum between last_map_probs_allocentric
and the
above returned map_probs_allocentric_update_no_grad
.
This will be None
if return_allocentric_maps
was False
.
* "dx_dz_dr_egocentric_preds" - The predicted change in x, z, and rotation of the agent (from the
egocentric perspective of the agent).
* "xzr_allocentric_preds" - The (predicted if self.use_pose_estimation == True
) allocentric
(x, z) position and rotation of the agent. This will equal None
if self.use_pose_estimation == False
and dx_dz_drs_egocentric
is None
.