How You Become the Controller
"You are the controller." If you've been following the buzz surrounding Kinect, you've probably heard this phrase tossed around. From plugging leaks with your hands and feet in Kinect Adventures! to changing songs with the flick of a wrist in Zune, Kinect opens up a new way to naturally experience your entertainment.
But once you get over the magic of opening the disc tray with the wave of a hand like you're a Jedi, you might start to wonder how it all works under the hood. In this blog post, I'll focus on the secret sauce behind the human tracking system and how it allows game developers to produce Kinect-enabled experiences. Then Arjun Dayal, a program manager on the team, will show you how Kinect enables a gesture-based approach to navigating the Xbox Dashboard and Kinect Hub. But before we get into any of that, let’s start with the conceptual principles that guided Kinect’s development.
It’s an Analog World We Live In
Traditional programming is based on sets of rules and heuristics: cause and effect, zero and one, true and false. Fortunately, this approach works well when modeling simple systems with a limited number of inputs and outputs. Take Halo for example (not a simple system by any means, but suitable for proving the point): pushing A causes Master Chief to jump; moving one stick causes him to walk forward; moving another causes him to look up. If A, then B. Unfortunately, the real world is not digital, but analog.
In the analog world, it’s not just about yes and no, but it’s about maybe. It’s not just about true or false, but it’s about probability. Think briefly about all of the possible variations of a human waving his hand: the range of physical proportions of the body, the global diversity of environmental conditions, differences in clothing properties, cultural nuances to performing even a simple gesture. Quickly, you end up with a search space around 10^23, an unrealistic problem domain to solve through conditional-based programming.
We knew early on that we had to invent a new way of approaching this problem, one that works like the human brain does. When you encounter someone in the world, your brain instantly focuses on him and recognizes him based on years of prior training. It doesn’t crawl a decision tree hundreds of levels deep to discern one human from another. It just knows. Where a baby would have a hard time telling the two apart, you’ve learned to do so in a split second. In fact, you could probably make a reasonable guess about their age, gender, ethnicity, mood, or even their identity (but that’s a blog post for another day). This is part of what makes us human.
Kinect was created in the same way. It sees the world around it. It focuses on you. And even though it’s never seen how you wave your hands before, it instantly approximates your movements to the terabytes of information it’s already learned.
The Kinect Sensor
At the heart of the skeletal tracking pipeline is a CMOS infrared sensor which allows Kinect to perceive the world, regardless of ambient lighting conditions. Think of this as seeing the environment in a monochrome spectrum of black and white: black being infinitely far away and white being infinitely close. The shades of gray in between these two extremes correspond to a physical distance from the sensor. The sensor gathers each point in its field of view and forms it into a depth image that represents the world. A stream of these depth images is produced at a rate of 30 frames per second, creating a real-time 3-D representation of the environment. Another way to think of this is like those pinpoint impression toys that used to be all the rage. By pushing up with your hands (or your face if you were really adventurous), you could create a simple 3-D model of a piece of your body.
Finding the Moving Parts
Read more: Xbox LIVE > Engineering Blog