27 January 2013

Hardware design of Kinect

The key components of Kinect are indicated in the figure 1, they are:
  1. Multi-array microphone: This is an array of four microphones that can isolate the voices of the user from the noise in the room. By comparing the delay in each microphone, the voice source can be located.
  2. IR laser emitter: Actively emitting near infrared spectrum, which can be distorted by uneven surface and then randomly formed as reflected speckles. The speckles can be received by infrared camera (No. 3 in the figure).
  3. IR camera: Capturing infrared signal which can be converted into depth map.
  4. Motorized tilt: The motor can be programmed in order to achieve the best view angle.
  5. USB cable: Transmitting video stream, depth stream and audio stream. The extra power source must be connected to get all functions of Kinect. (The power of Kinect is 12W while the power of normal USB cable is 2.5W)
  6. RGB camera: Capturing colour video stream.

Figure 1 Structure of Kinect and its key components
The "heart" of Kinect is the PS1080 system on chip (SoC) produced by PrimeSense. It is a multi-sensor system which can provide depth image, colour image and audio signal at the same time. As shown in figure 2, PS1080 encode the IR light and project it to the scene while IR camera capture the IR light and send the signal back to PS1080. The PS1080 process the signal and retrieve the depth image and combine it with the corresponding colour image. Since the audio part is not important  in this project so far, no detail of it will be introduced.

Figure 2 Recommended design of Primesense chip
The field of view in the system is 57 degrees horizontal, 43 degrees vertical, and the operational range is between 0.8 meters and 4 meters (normal mode). For the near mode, Kinect can detect object as close as 0.4 meters and as far as 3 meters (as shown in figure 3).
Figure 3 Types of values returned by the runtime
Specification of Kinect (referenced from here)

Kinect
Array Specifications
Viewing angle
43° vertical by 57° horizontal field of view
Vertical tilt range
±27°
Frame rate 
(depth and color stream)
30 frames per second (FPS)
Audio format
16-kHz, 24-bit mono pulse code modulation (PCM)
Audio input characteristics
A four-microphone array with 24-bit analog-to-digital converter (ADC) and Kinect-resident signal processing including acoustic echo cancellation and noise suppression
Accelerometer characteristics
A 2G/4G/8G accelerometer configured for the 2G range, with a 1° accuracy upper limit.



No comments:

Post a Comment