Hand Gestures July 26, 2006Posted by gordonwatts in computers, university.
Every graduate thesis defense committee at the UW has to have one person from outside the department of the student defending. These so-called Graduate School Representatives are there to make sure that the proceedings are fair and rigorous and the department isn’t trying to pass off a student, or giving a student an undue hard time. I’m on a few of these, and a few weeks ago I attended Habib Abi-Rached’s defense: “Stereo-based Hand Gesture Tracking and Recognition in Immersive Stereoscopic Displays.”
Ever heard of Second Life? It is a virtual reality with a virtual economy (whose Linden dollars translate to real cash). You make money by building things. One uses CAD programs and the like to do this. You can make things and then you can sell them. But what if you could put on a pair of stereo glasses and use your hands to build these 3D objects, rather than using a CAD program with a 2D interface? Rendering the environment in a 3D display is understood, but recognizing what your hands are doing and connecting them with the actions of building an object is only just starting.
Recognizing the hand movements is a two step process, according to Habib’s research. First you recognize what position your hand is in, and second you link together a series of these positions to get a gesture. Pattern recognition – figuring out where the hand is – is hard. Habib does it be taking a silhouette picture of the hand, and then drawing an outline. He then plots the distance between the peaks and valleys as a function of distance along the outline. For example, imagine your hand outstretched – there would be 5 peaks (one for each finger), and 4 valleys (not counting the beginning and end). Now, close the fingers into a fist and you’ll get one peak. Voila – you can tell the difference between an open hand and a fist by looking at a single plot.
Once that is done, you can string several hand positions together to make a gesture. Of course, not everyone is going to cleanly form the hand gestures – think of how much trouble and how long it has taken us to get speech recognition without training – and so you need some sort of discriminate to tell the difference between various gestures. Habib uses a SVM to do this.
The final goal was to enable manipulation in a 3D environment of blocks, 3D puzzles, etc., but sadly the research didn’t make it that far this time. That will be up to the next graduate student.
I’m not sure how long it will be before this sort of thing moves beyond entertainment. Imagine joining physics meetings in various rooms. Or classes is a virtual room (which I guess they do already in Second Life – but whoever heard of a sick teacher!?).