Major Research Interest
Computer Analysis of Dance |
Human-Computer Interaction |
3D reconstruction |
Depth Sensors |
Computer Analysis and Interpretation of Indian Classical Dance
The dance is the mother of the arts.
Music and poetry exist in time; painting and architecture in space.
But the dance lives at once in time and space.
- Curt Sachs
India has a rich cultural heritage, where classical dance have been interwoven into the social fabric. Indian Classical Dance (ICD) has many variants including Bharatanatyam, Kathak, Oddisi, Manipuri, and Kuchipuri. Each variant of ICD creates a sequence of rhythmic patterns synchronized with music. With the passage of time, these variants have been associated with a rich set of body postures and gestures and grammars for the performance have emerged.
Indian Classical Dance (ICD), an ancient heritage of India, consists of visual (posture, movements, and expressions), auditory (music, tempo, rhythm, and intonation) and textual (lyric of the song) information that tell a story through body movements (anga-bheva, uro-bheva, parsva-bheva, shiro-bheva, drishti-bheva, and pada-bheva), hand gestures or hasta-mudras (sanjukta-hasta and asanjukta-hasta), vocal and instrumental music, facial expression or emotions (nava-rasa)'s, costume, and make-up. With time, these dance forms have been interpreted and performed by different artists in different ways and various sets of complex rules have emerged for body postures and gestures.
The body should catch up to the time,
the hand must explain the meaning,
the eyes must speak the emotion,
and the feet must beat the time-measure.
- Natyashastra
Our objectives are to analyze and interpret the multimedia aspects of ICD using Kinect. Analysis of dance involves analyzing of the component parts like video, audio, text, and their relations in making up a complete dance. Interpretation of dance involves comprehending the multi-dimensional aspects of dance in the context of culture, story, emotion and gesture.
Automated analysis and interpretation of dance can be useful in several ways. For example, it can help to:
2. Preserve cultural heritage by dance transcription,
3. Synthesize and create animated avatars,
4. Interpret the story of an ICD recital, and so on.
Human-Computer Interaction
Emotion Recognition using Kinect Library
We are also working on emotion recognition problem using Kinect data and the Kinect Face Tracking Library (KFTL). A generative approach based on facial muscle movements is used to classify emotions. We detect various Action Units (AUs) of the face from the feature points extracted by KFTL and then recognize emotions by Artificial Neural Networks (ANNs) based on the detected AUs.
Hands-free Control of PowerPoint Presentation using Kinect Skeletons
We use the depth imaging technology of Microsoft Kinect to control PowerPoint presentations with gestures in touchless manner. We design a system, wherein the presenter is able to start/end the PowerPoint presentations, navigate between slides, capture or release the control of the cursor, and control it through natural gestures. Such a system is not only helpful for controlling PowerPoint presentations but is useful and hygienic in the kitchen, lavatories, hospital ICUs for touch-less surgery, and the like. The challenge is to extract meaningful gestures from continuous hand motions. We propose a system that recognizes isolated gestures from continuous hand motions for multiple gestures in real-time. Experimental results show that the system has 96.48% precision (at 96.00% recall) and performs better than the Microsoft Gesture Recognition library for swipe gestures.
Fast Gait Recognition from Kinect Skeletons Data Set
We attempt to use Kinect's skeleton stream for gait recognition. Various types of gait features are extracted from the joint-points in the stream and the appropriate classifiers are used to compute effective matching scores. To test our system and compare performance, we create a benchmark data set of 5 walks each for 29 subjects and implement a state-of-the-art gait recognizer for RGB videos. Tests show a moderate accuracy of 65% for our system. This is low compared to the accuracy of RGB-based method (which achieved 83% on the same data set) but high compared to similar skeleton-based approaches (usually below 50%). Further we compare execution time of various parts of our system to highlight efficiency advantages of our method and its potential as a real-time recognizer if an optimized implementation can be done.
3D reconstruction
We developed a method for omni-directional 3D reconstruction of a human figure using a single Kinect using two mirrors. We get three views from a single depth (and its corresponding RGB) frame using these two mirrors. One is the real view of the human and other two are the virtual views generated through the mirrors. We segment the three views of the human and create their point-clouds. Since the virtual objects are nearly at twice the depth of the actual object and are oriented at an angle, a set of transformations is required to bring the virtual views in the same coordinate system as of the real views for alignment. After transformations the multiple views are aligned by estimating the Kinect-mirror geometry. We use Iterative Closest Point (ICP) algorithm for fine alignment by minimizing the error between the overlapping surface portions of the real and virtual views. Experiments with 5 subjects show good reconstruction of human figures in Meshlab. Using the reconstruction of a regular geometric object we also quantitatively illustrate high accuracy of our reconstructed models. Our proposed system is efficient, accurate, and robust as it can reconstruct the 360o view of any object (though it is particularly designed for human figures) from single depth and RGB images. The system overcomes the difficulties of synchronization and removes the problem of interference noise of multi-Kinect system. The methodology can be used for a non-Kinect RGB-D camera and can be improved in several ways in future.
|
Depth Sensors
Characterizations of Noise in Kinect Depth Images
We characterize the noise in Kinect depth images based on multiple factors and introduce a uniform nomenclature for the types of noise. In the process, we briefly survey the noise models of Kinect and relate these to the factors of characterization. We also deal with the noise in multi-Kinect set-ups and summarize the techniques for the minimization of interference noise. Studies on noise in Kinect depth images are distributed over several publications and there is no comprehensive treatise on it. The characterization would help to selectively eliminate noise from depth images either by filtering or by adopting appropriate methodologies for image capture. In addition to the characterization based on the results reported by others, we also conduct independent experiments in a number of cases to fill up the gaps in characterization and to validate the reported results.
Study of Interference Noise in Multi-Kinect Set-up
Kinect, a low-cost multimedia sensing device, has revolutionized human computer interaction (HCI) by making various applications of human activity tracking affordable and widely available. Often multiple Kinects are used in imaging applications to improve the field of view, depth of field and uni-directional vision of a single Kinect. Unfortunately, multiple Kinects lead to IR Interference Noise (IR Noise, in short) in the depth map. Hence, we analyse the estimators for interference noise, survey various imaging techniques to mitigate the interference at source, and characterize them in parallel to a well-known classification system in telecom industry. Finally we compare their performance from reported literature and outline our on-going research to control interference noise by software shuttering.