18 February 2013

Kinect Coordinate Space

A Kinect is able to provide colour,  depth and skeleton  frame at a time. This post will introduce the coordinate space of each frame and at later post, I will use SDK to transform one frame to another, for example, aligning depth frame with colour frame and creating coloured point clouds. Because the Kinect SDK provides more accurate mapping of depth data into 3D space (or “skeleton" space”, in the parlance of this particular SDK).

Colour space (2D)
A colour frame is composed of pixels. Each pixel is consist of 4 channels - red, green, blue and alpha (transparency). Every pixel has an particular location (x, y) in the colour frame coordinate.

Depth space (2D)
Each pixel in the depth frame contains the Cartesian distance, in millimeters, from the depth camera plane to the nearest object at that particular (x, y) coordinate, as shown in the figure. The (x, y) coordinate of a depth frame do not represent physical unit in the room but the location of a pixel in the depth frame.
Depth stream values
Skeleton Space (3D)
For each frame, the depth image captured by the Kinect is converted into skeleton data which contains 3D position data for human skeletons. A 3D position in the skeleton space is represented as (x, y, z), as shown in the figure. The origin of the coordinate is placed at the Kinect depth camera, and it is a right-handed coordinate system. More specifically, positive y-axis extends upward, and the positive x-axis extends to the left of the Kinect. The positive z-axis is pointing to direction that Kinect is looking at.
Skeleton  space

No comments:

Post a Comment