allenact.embodiedai.mapping.mapping_utils.point_cloud_utils
#
camera_space_xyz_to_world_xyz
#
camera_space_xyz_to_world_xyz(camera_space_xyzs: torch.Tensor, camera_world_xyz: torch.Tensor, rotation: float, horizon: float) -> torch.Tensor
Transforms xyz coordinates in the camera's coordinate frame to world- space (global) xyz frame.
This code has been adapted from https://github.com/devendrachaplot/Neural-SLAM.
IMPORTANT: We use the conventions from the Unity game engine. In particular:
- A rotation of 0 corresponds to facing north.
- Positive rotations correspond to CLOCKWISE rotations. That is a rotation of 90 degrees corresponds to facing east. THIS IS THE OPPOSITE CONVENTION OF THE ONE GENERALLY USED IN MATHEMATICS.
- When facing NORTH (rotation==0) moving ahead by 1 meter results in the the z coordinate increasing by 1. Moving to the right by 1 meter corresponds to increasing the x coordinate by 1. Finally moving upwards by 1 meter corresponds to increasing the y coordinate by 1. Having x,z as the ground plane in this way is common in computer graphics but is different than the usual mathematical convention of having z be "up".
- The horizon corresponds to how far below the horizontal the camera is facing. I.e. a horizon of 30 corresponds to the camera being angled downwards at an angle of 30 degrees.
Parameters
- camera_space_xyzs : A 3xN matrix of xyz coordinates in the camera's reference frame.
- __Here
x, y, z = camera_space_xyzs[__:, i]
should equal the xyz coordinates for the ith point. - camera_world_xyz : The camera's xyz position in the world reference frame.
- rotation : The world-space rotation (in degrees) of the camera.
- horizon : The horizon (in degrees) of the camera.
Returns
3xN tensor with entry [
:, i] is the xyz world-space coordinate corresponding to the camera-space
coordinate camera_space_xyzs[
:, i]
depth_frame_to_camera_space_xyz
#
depth_frame_to_camera_space_xyz(depth_frame: torch.Tensor, mask: Optional[torch.Tensor], fov: float = 90) -> torch.Tensor
Transforms a input depth map into a collection of xyz points (i.e. a point cloud) in the camera's coordinate frame.
Parameters
- depth_frame : A square depth map, i.e. an MxM matrix with entry
depth_frame[i, j]
equaling the distance from the camera to nearest surface at pixel (i,j). - mask : An optional boolean mask of the same size (MxM) as the input depth. Only values
where this mask are true will be included in the returned matrix of xyz coordinates. If
None
then no pixels will be masked out (so the returned matrix of xyz points will have dimension 3x(M*M) - fov: The field of view of the camera.
Returns
A 3xN matrix with entry [
:, i] equalling a the xyz coordinates (in the camera's coordinate
frame) of a point in the point cloud corresponding to the input depth frame.
depth_frame_to_world_space_xyz
#
depth_frame_to_world_space_xyz(depth_frame: torch.Tensor, camera_world_xyz: torch.Tensor, rotation: float, horizon: float, fov: float)
Transforms a input depth map into a collection of xyz points (i.e. a point cloud) in the world-space coordinate frame.
IMPORTANT: We use the conventions from the Unity game engine. In particular:
- A rotation of 0 corresponds to facing north.
- Positive rotations correspond to CLOCKWISE rotations. That is a rotation of 90 degrees corresponds to facing east. THIS IS THE OPPOSITE CONVENTION OF THE ONE GENERALLY USED IN MATHEMATICS.
- When facing NORTH (rotation==0) moving ahead by 1 meter results in the the z coordinate increasing by 1. Moving to the right by 1 meter corresponds to increasing the x coordinate by 1. Finally moving upwards by 1 meter corresponds to increasing the y coordinate by 1. Having x,z as the ground plane in this way is common in computer graphics but is different than the usual mathematical convention of having z be "up".
- The horizon corresponds to how far below the horizontal the camera is facing. I.e. a horizon of 30 corresponds to the camera being angled downwards at an angle of 30 degrees.
Parameters
- depth_frame : A square depth map, i.e. an MxM matrix with entry
depth_frame[i, j]
equaling the distance from the camera to nearest surface at pixel (i,j). - mask : An optional boolean mask of the same size (MxM) as the input depth. Only values
where this mask are true will be included in the returned matrix of xyz coordinates. If
None
then no pixels will be masked out (so the returned matrix of xyz points will have dimension 3x(M*M) - camera_space_xyzs : A 3xN matrix of xyz coordinates in the camera's reference frame.
- __Here
x, y, z = camera_space_xyzs[__:, i]
should equal the xyz coordinates for the ith point. - camera_world_xyz : The camera's xyz position in the world reference frame.
- rotation : The world-space rotation (in degrees) of the camera.
- horizon : The horizon (in degrees) of the camera.
- fov: The field of view of the camera.
Returns
A 3xN matrix with entry [
:, i] equalling a the xyz coordinates (in the world coordinate
frame) of a point in the point cloud corresponding to the input depth frame.
project_point_cloud_to_map
#
project_point_cloud_to_map(xyz_points: torch.Tensor, bin_axis: str, bins: Sequence[float], map_size: int, resolution_in_cm: int, flip_row_col: bool)
Bins an input point cloud into a map tensor with the bins equaling the channels.
This code has been adapted from https://github.com/devendrachaplot/Neural-SLAM.
Parameters
- xyz_points : (x,y,z) pointcloud(s) as a torch.Tensor of shape (... x height x width x 3).
All operations are vectorized across the
...
dimensions. - bin_axis : Either "x", "y", or "z", the axis which should be binned by the values in
bins
. If you have generated your point clouds with any of the other functions in thepoint_cloud_utils
module you almost certainly want this to be "y" as this is the default upwards dimension. - bins: The values by which to bin along
bin_axis
, see thebins
parameter ofnp.digitize
for more info. - map_size : The axes not specified by
bin_axis
will be be divided byresolution_in_cm / 100
and then rounded to the nearest integer. They are then expected to have their values within the interval [0, ..., map_size - 1]. - resolution_in_cm: The resolution_in_cm, in cm, of the map output from this function. Every
grid square of the map corresponds to a (
resolution_in_cm
xresolution_in_cm
) square in space. - flip_row_col: Should the rows/cols of the map be flipped? See the 'Returns' section below for more info.
Returns
A collection of maps of shape (... x map_size x map_size x (len(bins)+1)), note that bin_axis
has been moved to the last index of this returned map, the other two axes stay in their original
order unless flip_row_col
has been called in which case they are reversed (useful as often
rows should correspond to y or z instead of x).