12 June 2013

Solve linear least-square problem using OpenCV (Reference)

C++: bool solve(InputArray src1, InputArray src2, OutputArray dst, int flags=DECOMP_LU)
Parameters:

  • src1 – input matrix on the left-hand side of the system.
  • src2 – input matrix on the right-hand side of the system.
  • dst – output solution.
  • flags –
solution (matrix inversion) method.
  • DECOMP_LU Gaussian elimination with optimal pivot element chosen.
  • DECOMP_CHOLESKY Cholesky LL^T factorization; the matrix src1 must be symmetrical and positively defined.
  • DECOMP_EIG eigenvalue decomposition; the matrix src1 must be symmetrical.
  • DECOMP_SVD singular value decomposition (SVD) method; the system can be over-defined and/or the matrixsrc1 can be singular.
  • DECOMP_QR QR factorization; the system can be over-defined and/or the matrix src1 can be singular.
  • DECOMP_NORMAL while all the previous flags are mutually exclusive, this flag can be used together with any of the previous; it means that the normal equations \texttt{src1}^T\cdot\texttt{src1}\cdot\texttt{dst}=\texttt{src1}^T\texttt{src2} are solved instead of the original system \texttt{src1}\cdot\texttt{dst}=\texttt{src2} .
The function solve solves a linear system or least-squares problem (the latter is possible with SVD or QR methods, or by specifying the flag DECOMP_NORMAL ):
\texttt{dst} =  \arg \min _X \| \texttt{src1} \cdot \texttt{X} -  \texttt{src2} \|
If DECOMP_LU or DECOMP_CHOLESKY method is used, the function returns 1 if src1 (or \texttt{src1}^T\texttt{src1} ) is non-singular. Otherwise, it returns 0. In the latter case, dst is not valid. Other methods find a pseudo-solution in case of a singular left-hand side part.

In my case, I have equation min|| P*b - (Xnew - Xm) || to solve. P is mxp eigenvector, Xnew is the target shape, Xm is the mean shape in trained model, b is the shape parameter vector we want to get. m is number of attributes (number of point * 3), p is the number of reduced eigenvalue.

Reference: http://docs.opencv.org/modules/core/doc/operations_on_arrays.html#solve

11 June 2013

Octree based point cloud stream compression in PCL

The point cloud produced from Kinect-like sensor is dense and updated at real-time frame rate (30 Hz). At usual case, 240k ~ 270k points are generated per second per Kinect depend on complexity of scene. As the dynamic point clouds are sampled at 30 Hz, it contains a total of about 7.5 million points per second. Since two Kinects have been used in this project, double of the point cloud is required to transmit.
The Point Cloud Library (PCL) has implemented an octree based compression class which enables both spacial and temporal compression at the same time. The full framework of the algorithm is introduced in [1].

Spacial compression:
The point cloud is decomposed into nodes by employing an octree structure which can be briefly illustrated as Figure 1. When constructing the octree, every node occupied by point(s) is set to non-empty (filled black in the figure). For each node contains point(s), we can further subdivide into 8 octants until the number of level or size of node reaches our predefined threshold. The child node configuration can be efficiently represented in a single byte (8-bit for 8 octants). Therefore, the whole octree structure is encoded into a stream, and the size of the stream equals to the number of branch nodes (unit is byte).

Figure 1 Schematic overview of the octree data structure and its serialization [1].

Temporal compression:
Since changes in the point cloud of the captured scene often occur only partially in confined areas, the correlation between currently sampled points and previously encoded point cloud can be exploited to improve compression performance. In [1] authors proposed a double-buffering octree method to solve the problem. Instead of eight child pointers to child nodes, the double-buffering use an additional set of eight child pointers to every branch. As shown in Figure 2, let's call them buffer A and B. Two buffers are alternatively assigned to every new captured point cloud. It is pointed out in the paper that the current point cloud in one octree structure allows us to spatially match, reference and analyse the structural changes within consecutive octree structure [1].

Figure 2 Illustration of differential encoding technique when using double-buffering octree [1].
In order to obtain and output the structural changes for every branch of two consecutive point clouds, the authors applied an exclusive disjunction operation (XOR) on the two bytes representing the node configuration for that branch over the whole octree structure.

Compression architecture:
In addition to the spacial and temporal encoded information as introduced above, for more precise and completed points representation, the authors also encode point detail and component information for their compression architecture as shown in Figure 3. For instance, distance between point and origin of node it belongs to was also calculated during octree construction. The other properties of point such as colour and normal are recorded as well. All the information that will be transmitted and recovered at decoder is encoded and compressed by range encoder [2].

Figure 3 The framework of compression and decompression [1].

Performance evaluation:
pointRes
octreeRes
FrameRate
Color
compression percentage
size of compressed cloud (kBytes)
Config 1
0.001
0.001
30
0
~5.0
~200
0.01
0.001
30
0
~5.0
~200
0.1
0.001
30
0
~5.0
~200
Config 2
0.001
0.01
30
0
~8.9
~370
0.01
0.01
30
0
~1.4
~19
0.1
0.01
30
0
~1.18
~50
Config 3
0.001
0.1
30
0
~15.5
~640
0.01
0.1
30
0
~8.0
~330
0.1
0.1
30
0
~5.4
~1
Config 4
0.01
0.01
5
0
~1.4(2.1)
~19(28)
Config 5
0.01
0.01
30
1
~13.9;~50.5(color)
~190

The table shows the compression performance of the framework in terms of compression percentage and size of compressed cloud under different configurations. The compression percentage equals to size of compressed cloud / size of uncompressed cloud (i.e. lower compression percentage means better compression performance).

Considering the best choice of configuration, we choose the one has low compression percentage and acceptable uncompressed cloud  resolution. For the requirements of this project, the second row of config 2 gives us small compressed cloud (faster to transmit) while achieving useful cloud resolution.
The compression framework also supports other information like colour. Although I haven't used any colour information in the shape model method so far, I experimented the colour encoding. As shown in config 5, compression percentage of colour information is relatively large (50.5%) which means that only half of the information is compressed. Comparing to the second row of config 2, by turning on the colour encoding the total compression percentage and the size of compressed cloud is 10 times larger than non-colour encoding configuration.

It's worth noting the frame rate parameter which control the interval to completed resynchronize the octree structure.

[1] Kammerl J, Blodow N, Rusu R B, et al. Real-time compression of point cloud streams[C]//Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012: 778-785.Reference:

[2] Martin G N N. Range encoding: an algorithm for removing redundancy from a digitised message[C]//Video and Data Recording Conference. 1979: 24-27.

03 June 2013

Manual shape alignment

From the built shape model, we retrieved a mean shape (called source shape) which is used for initial alignment with target shape captured from Kinect.

Step 1: Scale source shape and target shape into same metric so only rigid transformation need to be considered. The shapes are shown in Figrue 1.
Figure 1 Target shapes (left) and source shapes (right)
Step 2: Select at least 3 key points correspondence at both shape manually. A larger number of correspondence is recommended as the error can be reduced. For offline processing, I select the key points using MeshLab as shown in the Figure 2.
Figure 2 Select key points correspondence in source shape (left) and target shape (right)
Step 3: According to the selected correspondence, we can use RANSAC method to estimate the 6 DoF rigid transformation. I also tried to use function estimateAffine3D() given by OpenCV, but the returned matrix is useless. The failure may caused by too few correspondences(3) were applied. Finally I get the transformation using MeshLab and the result is shown in Figure 3. In later stage, which combine 3D reconstruction and recognition, I will use the function in PCL to estimate the transformation and hope that works.
Figure 3 Result after two shapes are aligned based on the estimated transformation
Based on the alignment, I will fine adapt the source shape to the target shape with constraints from the statistical shape model.