TraumaBot - 3D body reconstruction & recognition: March 2013

22 March 2013

Polarization of two Kinects for interference

In this post, I will describe a more completed experiment on 'how different filtering directions of the polarization filters affect the interference'.
For each polarization filter I defined two orthogonal directions in 2-D space: up and right. The light cannot pass through both directions at the same time. Each Kinect there are two filters in same direction attached to IR emitter and camera respectively. Therefore, there are 16 combinations exist (2 directions for each emitter or camera), and results are shown in following table. The K1-E represents IR emitter of the first Kinect and K2-C represents IR camera of the second Kinect. The results from both Kinects have been recorded for each experiment and coloured based on the level of interference.

	K1-E	K1-C	K2-E	K2-C
1	→	→	→	→
	Bad		Moderate
2	→	→	→	↑
	Bad		Moderate
3	→	→	↑	→
	Very Bad		Good
4	→	→	↑	↑
	Very Bad		Good
5	→	↑	→	→
	Bad		Moderate
6	→	↑	→	↑
	Bad		Moderate
7	→	↑	↑	→
	Very Bad		Good
8	→	↑	↑	↑
	Very Bad		Good
9	↑	→	→	→
	Good		Bad
10	↑	→	→	↑
	Good		Bad
11	↑	→	↑	→
	Moderate		Moderate
12	↑	→	↑	↑
	Moderate		Moderate
13	↑	↑	→	→
	Good		Bad
14	↑	↑	→	↑
	Good		Bad
15	↑	↑	↑	→
	Moderate		Moderate
16	↑	↑	↑	↑
	Moderate		Moderate

Here is a figure briefly shows how different interference levels are classified into Good, Moderate and Bad.

Figure 1 Upper: good; Middle: moderate; Lower: Bad

From the table we see that there isn't existed a case in which both Kinects can achieve good interference results. More importantly, good result only happens when up direction filter is attached to the Kinect and right direction filter is attached to the opposite one.

Hence, we can infer that the good result doesn't caused by the filters on same Kinect but the IR light from opposite Kinect is blocked by the filters in orthogonal direction (in this case right). Refer to some discussions online (link), the light produced by the emitter of Kinect has already been polarized (in my case more up direction).

Then I put one Kinect 90 degrees from another and add the filters as well. So one Kinect produced horizontally polarized light and another Kinect produced vertically polarized light. However, there isn't noticeable improvement from both Kinect sensors. It is possibly caused by the fact that the polarization would be ruined once the light reflects off arbitrary surfaces.

19 March 2013

Learning PCL - Basic data structure

In general, point clouds can be divided into organized and unorganized categories in terms their structure.

An organized point cloud dataset is the name given to point clouds that resemble an organized image (or matrix) like structure, where the data is split into rows and columns. Examples of such point clouds include data coming from stereo cameras or Time Of Flight cameras. The advantages of a organized dataset is that by knowing the relationship between adjacent points (e.g. pixels), nearest neighbor operations are much more efficient, thus speeding up the computation and lowering the costs of certain algorithms in PCL.

In unorganized point cloud, the points are placed in series without indexing. In PCL width and height attributes indicate the structure of that point cloud. If the height is 1, then it is a unorganized point cloud whose size is indicated by its width. For organized point cloud, the height * width is the number of points.

In order to declare and use the point cloud type in our own class, we need to define a pointer to the point cloud rather than a object itself. PCL uses boost::shared_ptr which is a little bit different from normal C++. Here is an example to declare a boost::share_ptr of a class:

struct MyClass{}; 

int main (int argc, char** argv) 
{ 
 boost::shared_ptr MyClassPtr; 
 MyClassPtr = boost::shared_ptr(new MyClass); 
 return(0); 
}

Following the same route, we can define a boost::share_ptr to pcl::PointXYZ class in this way:

int main (int argc, char** argv) 
{ 
 pcl::PointCloud::Ptr CloudPtr; 
 CloudPtr = pcl::PointCloud::Ptr(new pcl::PointCloud); 
 return(0); 
}

In the initialization of our class, we can 'initialize' the point cloud by:

 CloudPtr->resize(640*480);

It is also easy to access point in the point cloud with these snippets:

  // Output the (0,0) point
  std::cout << (*cloud)(0,0) << std::endl;

  // Set the (0,0) point
  (*cloud)(0,0).x = 1;
  (*cloud)(0,0).y = 2;
  (*cloud)(0,0).z = 3;
  // Confirm that the point was set
  std::cout << (*cloud)(0,0) << std::endl;

The result would be:
(0,0,0)
(1,2,3)

18 March 2013

Experiments on interference problem of two Kinects

In this post, I will show the results of two experiments that can mitigate the cross-talk between two Kinects. One is time multiplexing method and another is to polarize infrared light by polarizing film.

- Time multiplexing method
As mentioned in another post, MS Kinect SDK provides only function (NuiSetForceInfraredEmitterOff) to turn the IR emitter on/off without knowing the speed. Nevertheless, it is worth experimenting the performance of this function.
The idea is turn off another Kinect's IR emitter while current Kinect is capturing depth frame. The codes is like this:

 // -- process 1st kinect
 kinect2->turnOffIR(true);
 kinect1->processDepthFrame();
 kinect2->turnOffIR(false);
 kinect1->processColorFrame();
 kinect1->processPointCloud(); 
 // -- process 2nd kinect
 kinect1->turnOffIR(true);
 kinect2->processDepthFrame();
 kinect1->turnOffIR(false);
 kinect2->processColorFrame();
 kinect2->processPointCloud();

The turnOffIR is a member function of depthSensor class that calls the NuiSetForceInfraredEmitterOff.
The result from above code is unstable because insufficient time for SDK to turn off the IR emitter before the depth frame is retrieved. Therefore, I add different delays after calling the function to turnOffIR and the result in shown below.

Figure 1 No delay.

Figure 2 100 ms delay.

Figure 3 150 ms delay.

Figure 4 200 ms delay.

As the duration of delay is increasing, less holes caused by the interference. The best and most stable result can be achieved with about 200 ms delay per Kinect. For smaller delay, such as 50 ms, the point cloud become unstable.

Figure 5 50 ms delay.

The Figure 5 shows the point cloud captured in different time with same delay. The quality of the point cloud varying a lots. I guess that is because the amount of time required for turning off the IR emitter is different, so sometimes the IR emitter is turned off but sometimes it doesn't.
For the 200 ms delay, the frame rate of point cloud is decreased to roughly 2.5 fps (400 ms each frame).

- Polarization method
The idea is to distinguish the IR lights from different emitters by polarizing them in different direction. The Figure 6 shows the principle of the polarization of light. From the upper right diagram we can see letter 'A' easily as two polarizing films are placed in same direction. The lower left diagram shows that the light is blocked completely when one film is rotated 90 degrees from another.

Figure 6 Polarization using two polarizing films.

Finally, each Kinect is being attached two films with same direction of polarization, as shown in Figure 7. The other Kinect has two films with 90 degrees rotation. Hence, the IR camera can only see the IR light from its own emitter.

Figure 7 Two films with same direction of polarization are attached to IR emitter and camera. The blue stripes are just for illustration.

Here are some results of point cloud with polarization. Compared to the result without polarization, it is shows that the polarization method indeed improves the holes at the interference area while still maintaining real time frame rate. However, the point cloud are unstable, as shown in Figure 8 and 9, the appearance of the holes changing all the time. The situation happens less within the shared field of view and it is more likely to occur at far away scene and steep surface. I guess the reason is that some IR lights received by the IR camera in such situation are too small and ignored.

Figure 8 Top view of point cloud with polarization at different time.

Figure 9 Side view of point cloud with polarization at different time.

To sum up, for near field of interest the polarization method actually works relatively well. It is a non-invasive and simple implementation for the interference problem.
UPDATE: More details experiment described in another experiment (link) shows that the polarization method hardly improves the overall interference from two Kinects.

16 March 2013

Point clouds generation and alignment for two Kinects

Finally, they are aligned.

In this post, I will introduce coloured point cloud generation based on depth (disparity) and RGB images captured by IR and RGB camera respectively. Afterwards, the point clouds generated from two Kinects are aligned together based on the stereo calibration results from previous post: link.

- Point cloud generation
As the colour camera and depth camera are not at the same location and the field of view are different, the depth frame and colour frame are not aligned together without transformation. It is easy to implement the alignment between the colour and depth frame by using Kinect SDK which provides methods to transform the frame from one to the other. Here is the procedures with the API I used:

i. Given pixel coordinate in depth image, use this function NuiImageGetColorPixelCoordinatesFromDepthPixelAtResolution to get its corresponding pixel coordinate (x, y) in colour image.
ii. If the pixel (x, y) exceeds the boundary of the colour image, then set this point to nothing (located at origin without colour). Otherwise, colour the depth pixel.
iii. Given pixel coordinate in depth image, use NuiTransformDepthImageToSkeleton to get its 3D coordinate in skeleton space. Based on SDK document, the skeleton system mirrors the scene so a person facing to Kinect is considered to be looking in the -z direction in skeleton space. Here we need to invert x value in order to get normal non-inverted image.
iv. The function NuiTransformDepthImageToSkeleton returns a 4 elements vector which is represented as [x, y, z, w]. In order to get the real coordinate of the pixel (point), we can divide the first three elements (x, y and z) by the fourth elements that is: X = x/w, Y = y/w and Z = z/w.

- Point cloud alignment
It is a straightforward problem if we can correctly calibrate the extrinsic parameters and clearly define the coordinate systems. One thing I got stuck for long time is the different coordinate definition for Kinect. As shown in Figure 1, the SDK defines right-handed coordinate system that places a Kinect at the origin with the positive z-axis extending in the direction in which the Kinect is pointed. The positive y-axis extends upward, and the positive x-axis extends to the left. However, the MATLAB calibration toolbox also uses right-handed system but rotated around Z axis for 180 degrees from skeleton coordinate.

Figure 1 Kinect skeleton coordinate (blue) and camera coordinate (red) in MATLAB calibration toolbox.

Therefore, if we use the calibration result from the MATLAB toolbox we need to transform the point from skeleton coordinate to MATLAB camera coordinate and transform it back after the transformation from one Kinect coordinate to another has finished.

Figure 2 An animation for the point cloud alignment result.

As shown in the Figure 2, the result of alignment is promising but the interference problem still existed.

12 March 2013

Calibration of Kinect IR camera

Since the sdk function NuiImageGetColorPixelCoordinatesFromDepthPixelAtResolution gives pixel coordinates in color space that correspond to the specified pixel coordinates in depth space, I use IR camera (depth space) as reference.
I used Bouguet's MATLAB calibration toolbox to calibrate the IR camera individually and then find out extrinsic parameters between them. In this post I will focus on result and discussion, a comprehensive and completed step-by-step tutorial of the toolbox can be found in the website.
Before taking IR images for the calibration I turned off the IR emitter by SDK function (NuiSetForceInfraredEmitterOff(TRUE)) so that no IR pattern would affect quality of the IR images. The first and second cameras are called the first and second camera respectively. For both camera, I chose 12 different IR images which represent different pose of calibration pattern, as shown in Figure 1.

Figure 1 12 calibration images

Let's start with the left IR camera. After reading the images into MATLAB, we need to extract the corners of the pattern. It can be done by selecting four corners of the calibration pattern in each image and the toolbox can detect the corner within the selected area automatically, as shown in Figure 2. In terms of selecting four boundary points, I would suggest to select the inner points rather than four boundaries of the pattern. In this case, the toolbox can detect the corners better so makes our life easier for later re-computation stage.

Figure 2 Grid corners extraction

After all grid corners have been extracted, we can visualise the extrinsic results (Figure 3) and reprojection error.

Figure 3 Extrinsic parameters (camera-centred)

The intrinsic parameters of the left camera can be computed with uncertainties as shown below. More uncertainties, more error existed. In order to minimise the error, we can recompute the parameters by choosing the images that occur most error in the error diagram and extract the corners in a smaller and less distorted area. As shown in Figure 4, the corners are extracted within a 5x5 area instead of the original 5x7 area in the selected images.

Calibration results after optimization (with uncertainties):
Focal Length: fc = [ 569.92842 571.44089 ] ± [ 16.23688 14.06564 ]
Principal point: cc = [ 278.08552 258.82575 ] ± [ 25.86257 24.07492 ]
Skew: alpha_c = [ 0.00000 ] ± [ 0.00000 ] => angle of pixel axes = 90.00000 ± 0.00000 degrees
Distortion: kc = [ -0.28627 0.90685 0.00121 -0.01097 0.00000 ] ± [ 0.18908 1.79441 0.01032 0.00938 0.00000 ]
Pixel error: err = [ 0.28284 0.62366 ]

Figure 4 Grid corner extraction for re-calibration

Next, we can see the error became smaller than the first calibration results:

Calibration results after optimization (with uncertainties):
Focal Length: fc = [ 581.12084 581.34563 ] ± [ 10.53285 9.09579 ]
Principal point: cc = [ 281.39427 253.27383 ] ± [ 16.64980 15.86515 ]
Skew: alpha_c = [ 0.00000 ] ± [ 0.00000 ] => angle of pixel axes = 90.00000 ± 0.00000 degrees
Distortion: kc = [ -0.26651 0.92108 0.00301 -0.01151 0.00000 ] ± [ 0.13097 1.44999 0.00626 0.00590 0.00000 ]
Pixel error: err = [ 0.22538 0.35812 ]

By doing the procedures over and over again until an acceptable result has been achieved:

Calibration results after optimization (with uncertainties):

Focal Length: fc = [ 579.39816 580.78490 ] ± [ 9.58715 8.48105 ]

Principal point: cc = [ 279.82800 257.11149 ] ± [ 14.13074 12.45446 ]

Skew: alpha_c = [ 0.00000 ] ± [ 0.00000 ] => angle of pixel axes = 90.00000 ± 0.00000 degrees

Distortion: kc = [ -0.32611 1.61254 0.00131 -0.01085 0.00000 ] ± [ 0.10937 1.22361 0.00600 0.00512 0.00000 ]

Pixel error: err = [ 0.21153 0.29514 ]

For the right camera, the procedures are exactly same but one thing need to be noted is that the extracted corners in the left camera must correspond to the extracted corners in the right camera. Therefore, we need to be careful to the sequence when choosing the boundary of the calibration pattern in the images captured by the right camera.

Finally, the extrinsic parameters or in another word the relative position and orientation between two IR camera can be computed based on the individual calibration results. The extrinsic results is:

Extrinsic parameters (position of right camera wrt left camera):
Rotation vector: om = [ 0.12700 1.83078 2.45355 ]
Translation vector: T = [ 37.16528 -993.13499 676.55744 ]

The om vector is rotation vector which can be related to rotation matrix by Rodrigues' rotation formula. The three elements in the translation vector should be the rigid translation in x, y and z direction with unit of millimetre (mm). From Figure 5 we can visualise the relation between the left, right camera and the calibration pattern.

Say we have a point in the world and known its coordinate (X_L) in left camera reference frame, its coordinate (X_R) in the right camera reference frame can be calculated by:

X_R = R * X_L + T

Figure 5 Extrinsic parameters of stereo calibration

07 March 2013

Interference problem of multiple Kinects (2)

For the time multiplexing method, currently there is not software feasible way to do it. The Kinect SDK doesn't allow to toggle the laser but can only turn off the laser without knowing the speed. Several hardware methods has been implemented using shutter and revolving disk. The hardware time multiplexing implementations are complicated for this project, as micro-controller are needed for motor synchronisation.

For the polarization method, which polarise the light by putting a filter in IR emitter camera, there is a discussion on the issue available here. In the discussion, someone points out that: the polarization isn't well preserved when reflected on arbitrary surfaces. Furthermore, the luminance of IR light is reduced for better separation which results in a decreased SNR (signal to noise ratio).

Based on the hardware information of Kinect (link), infrared laser diode of Kinect is working at 830nm (I guess not perfect but centred at ~1.2 MHz). Therefore, if we want to use frequency multiplexing method, we may need to physically change laser diode (working at other frequency) of Kinect, and put matching filters in front of the IR camera. It is an expensive and invasive way which requires a complete disassembly of every single device.

A comprehensive ppt about MS Kinect is available here, in which it is pointed out that the interference is minimal with two Kinects. I did some experiments and found out that if shared area shown as holes in one Kinect, it is possibly visible in the other Kinect.

Figure 1 Depth frame captured by two Kinects faced to same area (upper row) with interference and (lower row) without interference.

Figure 2 Depth frame of box captured from two view points by Kinect shows that the holes are complementary in another frame (indicated by red rectangles)

Overall, I suggest to use more noninvasive method such as introduce vibration to Kinect or skip the interference problem at this time and come back later if the reconstruction results is not satisfied.

05 March 2013

Multiple Kinects display with OpenCV

It's straightforward to retrieve colour/depth frame from multiple Kinects by initialising and manipulating in normal way. However, it is a little bit tricky to visualise multiple frames from multiple Kinects in multiple windows with OpenGL. Therefore, I decided to convert the program into OpenCV platform which I am more familiar with than OpenGL.

In order to manipulate multiple Kinects easier, I create a class called DepthSensor which can be used to retrieve OpenCV image directly. Figure 1 shows a UML class diagram for DepthSensor class. I put Kinect initialisation in method init() whose input argument is a index indicating which Kinect is being initialised. It also handles the error if the index exceeds the maximum number of detected Kinect to computer. Before each time we want to call getColorImg() or getDepthImg(), processColorFrame() or processDepthFrame() need to be called. In processColorFrame() and processDepthFrame() method, I retrieve corresponding frame from corresponding stream and convert it into OpenCV image format (IplImage).

Figure 1 UML of class DepthSensor

The main running program can be as simple as:

    int main(int argc, char* argv[])  
{  
 IplImage *image1, *image2;
 DepthSensor kinect1, kinect2;

 if(!kinect1.init(1))
 {
  NuiShutdown();
  return 1;
 }
 if(!kinect2.init(2))
 {
  NuiShutdown();
  return 1;
 }
 
 while (1)
 {
  if (cvWaitKey(30) == 27)
   break;
  else
  {
   kinect1.processDepthFrame();
   kinect2.processDepthFrame();
   kinect1.processColorFrame();
   kinect2.processColorFrame();
   
   image1 = &kinect1.getColorImg();
   image2 = &kinect2.getColorImg();
   cvShowImage("Color Kinect1", image1);
   cvShowImage("Color Kinect2", image2);

   image1 = &kinect1.getDepthImg();
   image2 = &kinect2.getDepthImg();
   cvShowImage("Depth Kinect1", image1);
   cvShowImage("Depth Kinect2", image2);
  }
 }
 NuiShutdown();
 return 0;
}

The results are shown as:

Figure 2 Two Kinect colour/ depth display with OpenCV

Further works on DepthSensor:
In the following days, I will continue implement DepthSensor class so that it is able to manipulate 3D point clouds data.

TraumaBot - 3D body reconstruction & recognition

Pages