Rendering of a point cloud, showing a desktop.

All right: in the previous tutorial you installed OpenNI and PCL. You managed to get your device working. You compiled and ran the example program, watched as it fetched one frame after another, and even saved a couple of point clouds to disk. Now you may be wondering "What about now? What use depth sensors are? What can I do with point clouds?"

Well, depth cameras have a lot of useful applications. For example, you could use them as motion tracking hardware and save you thousands of dollars in professional gear. The original Kinect was actually designed as an alternative to physical controllers (i.e., gamepads) in games. Hand gestures can also be recognized, so people have managed to successfully implement control interfaces similar to what could be seen in Minority Report.

Working low-level with point clouds offers interesting possibilities too. As they are essentially a 3D "scan" or the scene, you could capture your whole room as a continuous 3D mesh, then import it in your modelling software of choice and use it as a reference for a better reconstruction. You could use it to measure distances, or detect obstacles, for a robot navigation system. But the most exciting alternative by far, and the one where most of the research is being focused on, is to perform 3D object recognition and pose estimation, even in real time. This allows a robot to fulfill tasks like going to the kitchen to get you a cup of coffee.

Most of the advanced stuff that can be done with a point cloud requires some previous steps, like filtering, reconstruction or normal estimation. In this tutorial, I will introduce you to the basics of point cloud processing, and leave the complicated methods for the next tutorial. I will explain what every technique does and what it should be used for, and include PCL code snippets so you can check how to implement it in your programs.

Point clouds

First, a little explanation. A point cloud as taken from a depth sensor consists of a series of points in 3D space, as simple as that. A "pcl::PointCloud<PointT>" object stores the points inside a "std::vector<PointT>" structure. The cloud object exposes some functions that let you get information about them, like the point count. As you can see, the class is templated so you can instantiate it for many types of points. For example, "PointXYZ", which stores three floats for the X, Y and Z coordinates, and "PointXYZRGB", which also stores color (texture) information for each point, the kind we would get with Kinect or Xtion RGB-D cameras. A list of point types and a basic description of the cloud object are available on PCL's website.

Once you have the cloud in memory, you can perform many operations with it. You can, of course, save them to disk, read them back, or concatenate several of them together. In order to select a subset of points, you can make use of indices, which contain a list of points from the original cloud. Many algorithms work with indices, or produce them as output. With the "ExtractIndices" class you create a second cloud that contains only the referenced points, or invert the selection and copy those not indexed.

When clouds are saved to disk, a PCD (Point Cloud Data) file is produced. You can choose to do it so in binary or plain text format. The first one is faster to work with, whereas the second one will let you to inspect the cloud by opening it with a common text editor. If you do so, you may see something like this:

# .PCD v.7 − Point Cloud Data file format
VERSION .7
FIELDS x y z
SIZE 4 4 4
TYPE F F F
WIDTH 2
HEIGHT 1
VIEWPOINT 0 0 0 1 0 0 0
POINTS 2
DATA ascii
0.73412 -1.12643 0.82218
0.44739 -0.34735 -0.04624

This tells us that the cloud consist of 2 points of type "PointXYZ" (because each point has only 3 fields; the X, Y and Z coordinates, each stored in a 4-byte float), and lists their coordinates. It also stores the viewport information (the relative position of the points to the sensor) as a translation quaternion. Knowing the position of the camera can be useful for certain procedures like normal estimation.

Feature estimation

Normal estimation

As you may remember from geometry class, the normal of a plane is an unit vector that is perpendicular to it. The normal of a surface at a point is defined as the vector that is perpendicular to the plane that is tangent to the surface at the point.

Surface normals can be calculated for the points of a cloud, too. They are considered a feature of the point (some characteristic that can help us to distinguish between two points), albeit not a very discriminative one. 3D features will be important later in our tutorials, when we talk about descriptors (more detailed "signatures" of points used for close matching).

I will not go into detail with the math of the estimation method, but you just have to know that is uses the nearest neighbors (the points that are closest to the one we are calculating the normal for) to find out the tangent plane and the normal vector. You can customize the method with the search radius (think about a sphere of that radius, centered in the point; all neighboring points that lie within will be used for the computation) and the viewpoint (by default, the output normals will be directionless; by supposing that all vectors must point towards the camera - because otherwise they would belong to surfaces that are not visible from the sensor - they can all be re-oriented accordingly).

Normals are also important because they give us information about the curvature of the surface at some point, which can be used to our advantage. Many of PCL's algorithms will require us to provide the normals of the input cloud. To estimate them, you can use the following code:

#include <pcl/io/pcd_io.h>
#include <pcl/features/normal_3d.h>
#include <boost/thread/thread.hpp>
#include <pcl/visualization/pcl_visualizer.h>

int
main(int argc, char** argv)
{
	// Object for storing the point cloud.
	pcl::PointCloud<pcl::PointXYZ>::Ptr cloud(new pcl::PointCloud<pcl::PointXYZ>);
	// Object for storing the normals.
	pcl::PointCloud<pcl::Normal>::Ptr normals(new pcl::PointCloud<pcl::Normal>);
	
	// Read a PCD file from disk.
	try
	{
		pcl::io::loadPCDFile<pcl::PointXYZ>(argv[1], *cloud);
	}
	catch (pcl::PCLException e)
	{
		return -1;
	}
	
	// Object for normal estimation.
	pcl::NormalEstimation<pcl::PointXYZ, pcl::Normal> normalEstimation;
	normalEstimation.setInputCloud(cloud);
	// For every point, use all neighbors in a radius of 3cm.
	normalEstimation.setRadiusSearch(0.03);
	// A kd-tree is a data structure that makes searches efficient. More about it later.
	// The normal estimation object will use it to find nearest neighbors.
	pcl::search::KdTree<pcl::PointXYZ>::Ptr kdtree(new pcl::search::KdTree<pcl::PointXYZ> ());
	normalEstimation.setSearchMethod(kdtree);
	
	// Calculate the normals.
	normalEstimation.compute(*normals);
	
	// Visualize them.
	boost::shared_ptr<pcl::visualization::PCLVisualizer> viewer(new pcl::visualization::PCLVisualizer ("Normals"));
	viewer->addPointCloud<pcl::PointXYZ> (cloud, "cloud");
	// Display one normal out of 20, as a line of length 3cm.
	viewer->addPointCloudNormals<pcl::PointXYZ, pcl::Normal>(cloud, normals, 20, 0.03, "normals");
	while (!viewer->wasStopped())
  	{
		viewer->spinOnce(100);
		boost::this_thread::sleep(boost::posix_time::microseconds(100000));
	}
	
	return 0;
}

As you can see, normals are stored in "PointCloud" objects too, instantiated to the "Normal" type. If you ever extract some indices from a cloud before feeding them to an algorithm, be sure to do the same with the normals. I have not bothered to include proper error/argument checking, nor have I made use of the "setViewPoint()" function, but you get the idea. Be sure to check the PCL API every time I introduce you to one of its classes or methods.

Visualization of the normals computed for a point cloud.

Input: Points, Search method, [Search radius], [Search surface], [Indices], [Viewpoint]
Output: Normals
Tutorial: Estimating Surface Normals in a Point Cloud

Decomposition

k-d tree

A k-d tree (k-dimensional tree) is a data structure that organizes a set of points in a k-dimensional space, in a way that makes range search operations very efficient (for example, finding the nearest neighbor of a point, which is the point that is closest to it in space; or finding all neighbors within a radius).

It is a binary tree, that is, every non-leaf node in it has two children nodes, the "left" and "right" ones. Each level splits space on a specific dimension. For example, in 3-dimensional space, at the root node (first level) all children would be split based on the first dimension, X (points having coordinates with greater X values would go to the right subtree, points with lesser values to the left one). At the second level (the nodes we just created), the split would be done on the Y axis, following the same criteria. At the third level (the grandchildren), we would use the Z axis. At the fourth level, we would get back to the X axis, and so on. Usually, the median point is chosen.