Skip to content

allenact.embodiedai.mapping.mapping_utils.point_cloud_utils#

[view_source]

camera_space_xyz_to_world_xyz#

camera_space_xyz_to_world_xyz(camera_space_xyzs: torch.Tensor, camera_world_xyz: torch.Tensor, rotation: float, horizon: float) -> torch.Tensor

[view_source]

Transforms xyz coordinates in the camera's coordinate frame to world- space (global) xyz frame.

This code has been adapted from https://github.com/devendrachaplot/Neural-SLAM.

IMPORTANT: We use the conventions from the Unity game engine. In particular:

  • A rotation of 0 corresponds to facing north.
  • Positive rotations correspond to CLOCKWISE rotations. That is a rotation of 90 degrees corresponds to facing east. THIS IS THE OPPOSITE CONVENTION OF THE ONE GENERALLY USED IN MATHEMATICS.
  • When facing NORTH (rotation==0) moving ahead by 1 meter results in the the z coordinate increasing by 1. Moving to the right by 1 meter corresponds to increasing the x coordinate by 1. Finally moving upwards by 1 meter corresponds to increasing the y coordinate by 1. Having x,z as the ground plane in this way is common in computer graphics but is different than the usual mathematical convention of having z be "up".
  • The horizon corresponds to how far below the horizontal the camera is facing. I.e. a horizon of 30 corresponds to the camera being angled downwards at an angle of 30 degrees.

Parameters

  • camera_space_xyzs : A 3xN matrix of xyz coordinates in the camera's reference frame.
  • __Here x, y, z = camera_space_xyzs[__:, i] should equal the xyz coordinates for the ith point.
  • camera_world_xyz : The camera's xyz position in the world reference frame.
  • rotation : The world-space rotation (in degrees) of the camera.
  • horizon : The horizon (in degrees) of the camera.

Returns

3xN tensor with entry [:, i] is the xyz world-space coordinate corresponding to the camera-space coordinate camera_space_xyzs[:, i]

depth_frame_to_camera_space_xyz#

depth_frame_to_camera_space_xyz(depth_frame: torch.Tensor, mask: Optional[torch.Tensor], fov: float = 90) -> torch.Tensor

[view_source]

Transforms a input depth map into a collection of xyz points (i.e. a point cloud) in the camera's coordinate frame.

Parameters

  • depth_frame : A square depth map, i.e. an MxM matrix with entry depth_frame[i, j] equaling the distance from the camera to nearest surface at pixel (i,j).
  • mask : An optional boolean mask of the same size (MxM) as the input depth. Only values where this mask are true will be included in the returned matrix of xyz coordinates. If None then no pixels will be masked out (so the returned matrix of xyz points will have dimension 3x(M*M)
  • fov: The field of view of the camera.

Returns

A 3xN matrix with entry [:, i] equalling a the xyz coordinates (in the camera's coordinate frame) of a point in the point cloud corresponding to the input depth frame.

depth_frame_to_world_space_xyz#

depth_frame_to_world_space_xyz(depth_frame: torch.Tensor, camera_world_xyz: torch.Tensor, rotation: float, horizon: float, fov: float)

[view_source]

Transforms a input depth map into a collection of xyz points (i.e. a point cloud) in the world-space coordinate frame.

IMPORTANT: We use the conventions from the Unity game engine. In particular:

  • A rotation of 0 corresponds to facing north.
  • Positive rotations correspond to CLOCKWISE rotations. That is a rotation of 90 degrees corresponds to facing east. THIS IS THE OPPOSITE CONVENTION OF THE ONE GENERALLY USED IN MATHEMATICS.
  • When facing NORTH (rotation==0) moving ahead by 1 meter results in the the z coordinate increasing by 1. Moving to the right by 1 meter corresponds to increasing the x coordinate by 1. Finally moving upwards by 1 meter corresponds to increasing the y coordinate by 1. Having x,z as the ground plane in this way is common in computer graphics but is different than the usual mathematical convention of having z be "up".
  • The horizon corresponds to how far below the horizontal the camera is facing. I.e. a horizon of 30 corresponds to the camera being angled downwards at an angle of 30 degrees.

Parameters

  • depth_frame : A square depth map, i.e. an MxM matrix with entry depth_frame[i, j] equaling the distance from the camera to nearest surface at pixel (i,j).
  • mask : An optional boolean mask of the same size (MxM) as the input depth. Only values where this mask are true will be included in the returned matrix of xyz coordinates. If None then no pixels will be masked out (so the returned matrix of xyz points will have dimension 3x(M*M)
  • camera_space_xyzs : A 3xN matrix of xyz coordinates in the camera's reference frame.
  • __Here x, y, z = camera_space_xyzs[__:, i] should equal the xyz coordinates for the ith point.
  • camera_world_xyz : The camera's xyz position in the world reference frame.
  • rotation : The world-space rotation (in degrees) of the camera.
  • horizon : The horizon (in degrees) of the camera.
  • fov: The field of view of the camera.

Returns

A 3xN matrix with entry [:, i] equalling a the xyz coordinates (in the world coordinate frame) of a point in the point cloud corresponding to the input depth frame.

project_point_cloud_to_map#

project_point_cloud_to_map(xyz_points: torch.Tensor, bin_axis: str, bins: Sequence[float], map_size: int, resolution_in_cm: int, flip_row_col: bool)

[view_source]

Bins an input point cloud into a map tensor with the bins equaling the channels.

This code has been adapted from https://github.com/devendrachaplot/Neural-SLAM.

Parameters

  • xyz_points : (x,y,z) pointcloud(s) as a torch.Tensor of shape (... x height x width x 3). All operations are vectorized across the ... dimensions.
  • bin_axis : Either "x", "y", or "z", the axis which should be binned by the values in bins. If you have generated your point clouds with any of the other functions in the point_cloud_utils module you almost certainly want this to be "y" as this is the default upwards dimension.
  • bins: The values by which to bin along bin_axis, see the bins parameter of np.digitize for more info.
  • map_size : The axes not specified by bin_axis will be be divided by resolution_in_cm / 100 and then rounded to the nearest integer. They are then expected to have their values within the interval [0, ..., map_size - 1].
  • resolution_in_cm: The resolution_in_cm, in cm, of the map output from this function. Every grid square of the map corresponds to a (resolution_in_cmxresolution_in_cm) square in space.
  • flip_row_col: Should the rows/cols of the map be flipped? See the 'Returns' section below for more info.

Returns

A collection of maps of shape (... x map_size x map_size x (len(bins)+1)), note that bin_axis has been moved to the last index of this returned map, the other two axes stay in their original order unless flip_row_col has been called in which case they are reversed (useful as often rows should correspond to y or z instead of x).