15 April 2013

Study of Active Shape Model

Here is a definition of ASM from link:
Active Shape Models are statistical models of the shapes of objects which iteratively deform to fit to an example of the object in a new image. The shapes are constrained by a Statistical Shape Model to vary only in ways seen in a training set of labelled examples. 
 An tutorial MATLAB program from here is used to learn the programming part of ASM.

Firstly, landmarks of target in each training sample are labelled. It will result in a smooth contour by uniform interpolating the landmarks and connecting all points between the landmarks. For each shape (1 shape per sample), we need to translate it to origin and correct its rotation so that shapes in different samples have same mean position (the origin) and orientation. It can be done by subtracting mean position and rotation from original shape, like the following MATLAB code:
% Center data to remove translation 
offsetv = -mean(Vertices,1);
Vertices(:,1) = Vertices(:,1) + offsetv(1);
Vertices(:,2) = Vertices(:,2) + offsetv(2);

% Correct for rotation
% Calculate angle to center of all points
rot = atan2(Vertices(:,2),Vertices(:,1));
% Subtract the mean angle
offsetr=-mean(rot(1:round(end/2)));
rot = rot+offsetr;
% Make the new points, which all have the same rotation
dist = sqrt(Vertices(:,1).^2+Vertices(:,2).^2);
Vertices(:,1) = dist.*cos(rot);
Vertices(:,2) = dist.*sin(rot);
Vertices is a n-by-2 matrix (in 2D) and n is number of point (include landmarks and interpolated points) in each sample.

Secondly, all normalized points in every sample will be analysed by PCA (principle component analysis) and eigen vectors and values are found over all samples.
[Evalues, Evectors, x_mean]=PCA(x);

% Keep only 98% of all eigen vectors, (remove contour noise)
i=find(cumsum(Evalues)>sum(Evalues)*0.98,1,'first'); 
Evectors=Evectors(:,1:i);
Evalues=Evalues(1:i);
In above code, x is a 2n-by-N matrix where n is number point in each sample and N is number of samples for training. We only remain the larger eigenvalues which indicate the most significant modes of variation in the shape. In this case, 98% of variation can be achieved by accepting 6 (out of 10) largest eigenvalues.

Thirdly, an appearance model which samples a intensity pixel profile/line perpendicular to each contour point in each training set is been made. In this step, normal direction of each contour point is approximated by neighbouring points. More specifically, normal of point 2 is perpendicular to the line connecting point 1 and 3. Since the last point in training set is connected to the first point, therefore, the normal of the 1st point is calculated from the last point and the 2nd point.
Based on the normal, we can get the pixel profiles of every landmark (contour point) perpendicular to the contour. For both positive and negative normal direction, we can set how many pixel we want to have (length of profile).
If we want to search method using PCA on intensities at later application stage, we should apply PCA on the intensity profile first. Again, we remain the eigenvector whose eigenvalue is large so that 98% of total eigenvalue has achieved. With this, noises around contour are removed.
-----------------------------------------------------------------------------------------------------------
The training process of ASM is finished and now we can apply the model on a test sample with target object. We need to give an initial contour approximation of the target manually (automatic solution is achievable using Genetic Algorithm). In each iteration, suggested movement for each model point is calculated in order to minimise the distance to mean of eigenvector.
-----------------------------------------------------------------------------------------------------------
Effect of parameters:
length of landmark intensity profile: determine how many pixels are recorded in perpendicular direction of landmarks.


12 April 2013

Target point cloud extraction from background

In this post, I will introduce how I extract target (human body) point cloud from background.
Here is a flow chart describes the process:
Figure 1 Flow chart for point cloud extraction.
In order to align cloud 1 and cloud 2 together, I perform two transformation on the cloud 1. The rough transformation used pre-calibration information which show the relative position between two camera. As the calibration was not perfect, the two cloud did not align perfectly. I apply ICP (iterative closest point) to cloud 1 so I got a perfect aligned cloud and corresponding transformation. In later process, I used this transformation from ICP directly rather than compute it at each iteration.

The combined cloud still contain target and background so far. As the point cloud is not aligned with camera coordinate, it is not intuitive to crop the cloud directly. I apply another transformation to the combined cloud manually so the horizontal plane of point cloud is aligned with x-z plane (can be other orthogonal plane) in camera coordinate. Then the point cloud is cropped at x, y and z coordinate in sequence. The result cloud after the cropping is our target point cloud.
Here are some results:
Figure 2 Point cloud of body model (rotated around)

Figure 3 Yang's model
Figure 4 Lin's model

Although left and right camera gave colour image in different brightness, we can see the alignment result was promising.

Future Improvements:
1. The transformation from object coordinate to camera coordinate was fine manually. It can be found automatically by finding a plane (table/ ground) of the scene then calculate the transformation from the plane to orthogonal plane in camera coordinate.
2. The cropping threshold was find manually as well. In future, I can detect the landmark (boundary) of bed, and therefore get the threshold.

The model shown above can be downloaded here: bodyModel.plyYang.ply; Lin.ply
You can view these models in MeshLab directly.

11 April 2013

Kinect Fusion with MS SDK

Recently, Microsoft updated their Kinect SDK which supports Kinect Fusion and some interaction interfaces. In this post I will introduce the minimum codes that is required for implementing Kinect Fusion with the Kinect SDK.
Before the code, I'd like to start with an overview of process pipeline for Kinect Fustion. Here is a image from MS website (link):
Figure 1 Process pipeline for Kinect Fusion

In order to make use of the Kinect Fusion api, we need to include NuiKinectFusionApi.h and also link the project to KinectFusion170_32.lib. The files are located in ${Kinect_SDK DIR}/Developer Toolkit v1.7.0/inc/ and ${Kinect_SDK DIR}/Developer Toolkit v1.7.0/Lib/x86/ correspondingly.

I wrote a depthSensor class to manipulate the process more easily. Firstly, declaration of the class is:
class depthSensor
{
private:
 static const int WIDTH = 640;
 static const int HEIGHT = 480;
 static const int UPDATE_WEIGHT = 10;
 int trackingErrorCount;

 INuiSensor*  mNuiSensor;
 // -- 3D volume of reconstruction
 INuiFusionReconstruction* m_pVolume;

 NUI_FUSION_IMAGE_FRAME*     m_pDepthFloatImage;
 NUI_FUSION_IMAGE_FRAME*     m_pPointCloud;
 NUI_FUSION_IMAGE_FRAME*     m_pShadedSurface;

 HANDLE   mNextDepthFrameEvent;
 HANDLE   mNextColorFrameEvent;
 HANDLE   mDepthStreamHandle;
 HANDLE   mColorStreamHandle;

 void initKinectFusion();
 void createInstance(int const kinect_index);
 
public:
 depthSensor();
 void init(int const kinect_index);
 void processDepth(cv::Mat& mat);
 void processKinectFusion(const NUI_DEPTH_IMAGE_PIXEL* depthPixel, int depthPixelSize, cv::Mat& mat);
 const Matrix4& IdentityMatrix();
 ~depthSensor();
};
The most important member of the class is:
 // -- 3D volume of reconstruction
 INuiFusionReconstruction* m_pVolume;
It is the 3D volumetric integration of the scene cumulatively reconstructed by new depth data from Kinect as shown in the Figure 1.
These two member function initialise Kinect instance and setup basic status:
 void createInstance(int const kinect_index);
 void init(int const kinect_index);
At the end of the init() function, we call the initKinectFusion() which initialse the parameters and status relative to Kinect Fusion algorithm. Here is the definition of initKinectFusion():
void depthSensor::initKinectFusion()
{
 HRESULT hr = S_OK;

 NUI_FUSION_RECONSTRUCTION_PARAMETERS reconstructionParams;
 reconstructionParams.voxelsPerMeter = 256; // 1000mm / 256vpm = ~3.9mm/voxel
 reconstructionParams.voxelCountX = 512;  // 512 / 256vpm = 2m wide reconstruction
 reconstructionParams.voxelCountY = 384;  // Memory = 512*384*512 * 4bytes per voxel
 reconstructionParams.voxelCountZ = 512;  // Require 512MB GPU memory

 hr = NuiFusionCreateReconstruction(&reconstructionParams,
          NUI_FUSION_RECONSTRUCTION_PROCESSOR_TYPE_CPU,
          -1, &IdentityMatrix(), &m_pVolume);
 if (FAILED(hr))
  throw std::runtime_error("NuiFusionCreateReconstruction failed.");

 // DepthFloatImage
 hr = NuiFusionCreateImageFrame(NUI_FUSION_IMAGE_TYPE_FLOAT, WIDTH, HEIGHT, nullptr, &m_pDepthFloatImage);
 if (FAILED(hr))
  throw std::runtime_error("NuiFusionCreateImageFrame failed (Float).");

 // PointCloud
 hr = NuiFusionCreateImageFrame(NUI_FUSION_IMAGE_TYPE_POINT_CLOUD, WIDTH, HEIGHT, nullptr, &m_pPointCloud);
 if (FAILED(hr))
  throw std::runtime_error("NuiFusionCreateImageFrame failed (PointCloud).");

 // ShadedSurface
 hr = NuiFusionCreateImageFrame(NUI_FUSION_IMAGE_TYPE_COLOR, WIDTH, HEIGHT, nullptr, &m_pShadedSurface);
 if (FAILED(hr))
  throw std::runtime_error("NuiFusionCreateImageFrame failed (Color).");

 m_pVolume->ResetReconstruction( &IdentityMatrix(), nullptr);
}
Within the function, we need to setup the volume's parameters, like the size and resolution. Depends on the memory available on GPU, size and number of voxel can be set for higher or lower spacial resolution. There are three kind of data will be produced through the Kinect Fusion process. The depth float image contains the depth data from the Kinect in float format. From the depth data, we can create point cloud data. The shaded surface image shows ray-casting shaded surface of the volume at current camera position. The following codes show the process from depth image to volume reconstruction and ray-casting image for visualisation.
void depthSensor::processKinectFusion( const NUI_DEPTH_IMAGE_PIXEL* depthPixel, int depthPixelSize, cv::Mat& mat ) 
{
 // Depth pixel (mm) to float depth (m)
 HRESULT hr = NuiFusionDepthToDepthFloatFrame( depthPixel, WIDTH, HEIGHT, m_pDepthFloatImage,
  NUI_FUSION_DEFAULT_MINIMUM_DEPTH, NUI_FUSION_DEFAULT_MAXIMUM_DEPTH, TRUE );
 if (FAILED(hr)) {
  throw std::runtime_error( "::NuiFusionDepthToDepthFloatFrame failed." );
 }

 // Update volume based on new depth frame
 Matrix4 worldToCameraTransform;
 m_pVolume->GetCurrentWorldToCameraTransform( &worldToCameraTransform );
 hr = m_pVolume->IntegrateFrame(m_pDepthFloatImage, UPDATE_WEIGHT, &worldToCameraTransform);
  hr = m_pVolume->ProcessFrame( m_pDepthFloatImage, NUI_FUSION_DEFAULT_ALIGN_ITERATION_COUNT,
   UPDATE_WEIGHT, &worldToCameraTransform );
  if (FAILED(hr)) {
   // If tracking is failed, count it
   // Reconstruction will be reset it enough number of failed has achieved
   ++trackingErrorCount;
   if ( trackingErrorCount >= 100 ) {
    trackingErrorCount = 0;
    m_pVolume->ResetReconstruction( &IdentityMatrix(), nullptr );
   }
 
   return;
  }

 // Get PointCloud from the volume
 hr = m_pVolume->CalculatePointCloud( m_pPointCloud, &worldToCameraTransform );
 if (FAILED(hr)) {
  throw std::runtime_error( "CalculatePointCloud failed." );
 }

 // Ray-casting shaded PointCloud based on current camera pose
 hr = ::NuiFusionShadePointCloud( m_pPointCloud, &worldToCameraTransform,
  nullptr, m_pShadedSurface, nullptr );
 if (FAILED(hr)) {
  throw std::runtime_error( "::NuiFusionShadePointCloud failed." );
 }

 // Get 2D frame to be shown
 INuiFrameTexture * pShadedImageTexture = m_pShadedSurface->pFrameTexture;
 NUI_LOCKED_RECT ShadedLockedRect;
 hr = pShadedImageTexture->LockRect(0, &ShadedLockedRect, nullptr, 0);
 if (FAILED(hr)) {
  throw std::runtime_error( "LockRect failed." );
 }

 mat = cv::Mat( HEIGHT, WIDTH, CV_8UC4, ShadedLockedRect.pBits );

 // We're done with the texture so unlock it
 pShadedImageTexture->UnlockRect(0);
}
At each frame, the camera pose is tracked so that new point cloud can be integrated into current volume correctly. If the camera is static, then we don't need to track camera position and the transformation matrix is set to identity.

Full code download: depthSensor.cpp; depthSensor.h

04 April 2013

PCL - point cloud filtering

Filtering a point cloud using a Pass Through filter:
In PCL, there is a basic filter class for point cloud filtering along specified dimension. Here is a snippet of example code:
int main (int argc, char** argv)
{
 pcl::PointCloud::Ptr cloud (new pcl::PointCloud);
 pcl::PointCloud::Ptr cloud_filtered (new pcl::PointCloud);

 // Fill in the cloud data here


 // Create the filtering object
 pcl::PassThrough pass;
 pass.setInputCloud (cloud);
 pass.setFilterFieldName ("y");
 pass.setFilterLimits (0.0, 500.0);
 //pass.setFilterLimitsNegative (true);
 pass.filter (*cloud_filtered);

 return (0);
}
Above codes will filter the points whose y axis is outside boundary of 0.0 to 500.0.
From these two lines we can configure which axis and what limit for the filter.
 pass.setFilterFieldName ("z");
 pass.setFilterLimits (0.0, 100.0);
If we want to filter the points inside the interval (i.e. keep points outside the min & max limit), then just simply set the function "setFilterLimits" to true:
 pass.setFilterLimitsNegative (true);

---
Removing outliers using a StatisticalOutlierRemoval filter:
There are always noises in point cloud constructed from Kinect due to varying IR light received and measurement errors. The noises may complicate the estimation of local point cloud characteristics such as surface normals and curvature changes. This might lead to registration failure. We can apply statistical analysis on each point's neighborhood and filtering the points which do not meet a certain criteria. PCL provides filter to remove outliers whose mean distances to their own neighbors are outside an interval defined by global distances mean and standard deviation.
Here is the code:
int main (int argc, char** argv)
{
 pcl::PointCloud::Ptr cloud (new pcl::PointCloud);
 pcl::PointCloud::Ptr cloud_filtered (new pcl::PointCloud);

 // Fill in the cloud data
 pcl::PCDReader reader;
 // Read input point cloud
 reader.read ("table_scene_lms400.pcd", *cloud);

 std::cerr << "Cloud before filtering: " << std::endl;
 std::cerr << *cloud << std::endl;
 // Visualise input cloud
 pcl::visualization::CloudViewer viewer ("Input");
 viewer.showCloud(cloud);
 while (!viewer.wasStopped()){}

 // Create the filtering object
 pcl::StatisticalOutlierRemoval sor;
 sor.setInputCloud (cloud);
 sor.setMeanK (10);
 sor.setStddevMulThresh (1.0);
 sor.filter (*cloud_filtered);

 std::cerr << "Cloud after filtering: " << std::endl;
 std::cerr << *cloud_filtered << std::endl;

 // Visualise output cloud
 pcl::visualization::CloudViewer viewer1 ("Output");
 viewer1.showCloud(cloud_filtered);
 while (!viewer1.wasStopped()){}

 return (0);
}
The function:
 sor.setMeanK (10);
set the number of points (k) to use for mean distance estimation. For larger k, the points lay within larger outlier area will be removed. If k is too big, the correct points with low density may be removed incorrectly. For small k, some outliers which are noises may not be removed.
Another function:
 sor.setStddevMulThresh (1.0);
means that all points who have a distance larger than 1 s.d. of the mean distance to the query point will be removed.
---
Downsampling a point cloud using a VoxelGrid filter:
Since the model extracted from Kinect still contains many redundant points which lead to unnecessary computation. For real-time shape model application, downsampling such dense point cloud while remain the vertices variation of the model shape is needed to be done very fast.
int main (int argc, char** argv)
{
 pcl::PointCloud::Ptr cloud (new pcl::PointCloud);
 pcl::PointCloud::Ptr cloud_filtered (new pcl::PointCloud);

 // Fill in the cloud data
 pcl::PLYReader reader;
 // Replace the path below with the path where you saved your file
 reader.read ("bodyModel.ply", *cloud); // Remember to download the file first!

 std::cerr << "PointCloud before filtering: " << cloud->width * cloud->height 
  << " data points (" << pcl::getFieldsList (*cloud) << ").";

 // Create the filtering object
 pcl::VoxelGrid sor;
 sor.setInputCloud (cloud);
 sor.setLeafSize (0.01f, 0.01f, 0.01f);
 sor.filter (*cloud_filtered);

 std::cerr << "PointCloud after filtering: " << cloud_filtered->width * cloud_filtered->height 
  << " data points (" << pcl::getFieldsList (*cloud_filtered) << ").";

 pcl::io::savePLYFileBinary("bodyModelfiltered.ply",*cloud_filtered);


 return (0);
}
The function setLeafSize can control the resolution of filtered cloud in xyz direction. For larger value the resolution will be worse.
Here is a figure shows the model before and after downsampling:
Figure 1 Result of downsampling, input model (left) and filtered model (middle & right). Middle one used leaf size of 0.01 while right one used leaf size of 0.005.
From the left to middle one, the number of point of the model is reduced from 123999 to 6120. The program can be finished almost immediately.