Area of Research : Music Signal Processing
Research objective:
Email : pradeeprengaswamy@gmail.com
Area of Research : Biomedical Signal Processing
Research objective:
Email : tanum.dets@gmail.com
Area of Research : Speech Processing
Research objective:
Human speech is a non-linear and non-stationary signal. Apart from the content of the
spoken message, speech signal is characterized by various speaker-dependent features
such as age, gender, accent, emotion, presence of ailment etc. Moreover, speech signal
produced by humans can be corrupted by noise present in the ambience as well as
speech recording framework. The current state of the art systems treat speech signal as
linear and stationary over small intervals and extract features representing each of the
short frames. MFCC, LPC and PLP are the most widely used features in various speech
processing tasks such as speech recognition, speaker identification, emotion recognition
etc. However, the conventional features extraction methods do not always provide the best
discriminating features for various recognition tasks. In past recent time-frequency domain
features of speech signal are being explored both in lieu of and along with conventional
feature extraction methods to get best performance in speech processing systems. Time-
frequency domain analysis involves decomposing the speech signal into finite number of
narrowband components that add up to the original speech signal. My research objective
is exploring various decomposition methods to find the narrowband components that
mostly contain the spoken content of the message and discard the components containing
mostly noise and speaker-dependent information. Speech signal thus decomposed is
desirable for speaker-independent noise-robust speech recognition.
Two methods used extensively in recent past for time-frequency analysis of non-linear,
nonstationary signals are empirical mode decomposition (EMD) and variational mode
decomposition (VMD). EMD is a completely data-dependent algorithm whereas VMD
needs certain parameters to be provided by user prior to decomposition of signal, so the
user need to have some prior knowledge of the nature of the signal to decompose it into
physically significant components using VMD. On the other hand, EMD is not well-defined
mathematically, is less noise-robust than VMD and suffers from a problem named mode
mixing. Along with using VMD for meaningful decomposition of speech, my goal is also to
modify it’s algorithm so that it can incorporate the benefit of being completely data-
depndent like EMD, so that the algorithm does not reply on the user providing correct input
arguments to decompose the signal.
Email : bsutapaece@gmail.com
Area of Research : Speech Processing
Research objective:
Email : kiran.reddy889@gmail.com
Area of Research : Bioinformatics
Research objective:
Email : s.sin443@gmail.com
Area of Research : Speech Processing
Research objective: A massive volume of audio data is piling up from several sources in day to day life, such as news channels, entertainment, education, etc. Organizing these data and retrieving the relevant audio content, for the queried audio remains a challenging task. My objective is to segregate the entire speech corpus into meaningful groups at a broader semantic level. The standard domain keywords match between pairs of speech utterances are discovered to achieve speech utterances clustering task. With the obtained clusters, the retrieval task is carried out by detecting the keywords match with the queried audio, and the relevant speech utterances associated with that particular keyword is retrieved. The final clusters may represent the broader classes of information such as politics, sports, and weather, etc.
Email : rskishorekumar@gmail.com
Area of Research : Speech Processing
Research objective: The natural way of communication for humans is speech. Therefore, automatic speech recognition (ASR) systems are explored by the researchers to provide the natural interaction between machine and human. As the demand for speech recognition in multiple languages grows, the development of multilingual ASR system which combines the phonetic unit of all languages to be recognized into one single global acoustic model set is of increasing importance. Traditional ASR systems are developed for read mode of speech. However, speech can be broadly classified into three modes, such as read, extempore, and conversation. The performance of ASR will be affected when input utterance belongs to a different mode of speech. The reason for this is the mismatch in the acoustic and linguistic characteristics of speech signals across various modes of speech. Therefore, my research objective is focussed on developing a framework for automatically recognizing phonetic units present in a speech utterance of any language spoken in any mode.
Email : kumudtripathi.cs@gmail.com
Area of Research :Multi-modal lecture video analysis
Research objective:
With the advent of internet technologies and the popularity of Massive Open Online
Courses (MOOC)s large number of educational videos in the form of e-learning
courses, hosted at the webportals, namely edx, coursera, udemy, NPTEL, etc where
courses from diverse domains are available in the internet. This has truely democratised
learning and has extended quality education to masses beyond the ambient of traditional
classrooms. However in most of the video sharing platforms like Youtube, Vimeo, Dailymotion etc. video search are done based on textual meta-data. The textual meta-
data is limited and are generally done by humans. The manual entry is error-prone,
laborious and expensive. Also finding specic point of interest requires a search engine
that can analyze the content of video and identify important semantic segments and
keywords automatically. It is therefore desired to generate automatic meta-data for
indexing and retrieval of lecture videos.
The overall objective of the work is to perform automatic semantic segmentation of video
lecture and spoken word recognition by exploiting audio-visual features from the video
file. With the semantic segmentation of video lectures and spoken words recognized
from the audio tracks, a system is proposed to be developed that facilitates indexing and
retrieval of video lectures with a utility to perform text-based search at two levels -
segmental level and word level. To address these issues, deep learning techniques are
used to perform semantic segmentation of video lectures and forced alignment
technique are explored for spoken word recognition. After the multi-modal analysis of
lecture videos, a web-based system is proposed that performs indexing and retrieval.
Email : dnabhijit@gmail.com
Area of Research : Music Signal Processing
Research objective:
Email : mgurunathreddy@gmail.com
Area of Research : Speech Processing
Research objective:
In the evolving field of human–computer interaction (HCI), there are a large number of modes by which humans communicate with computers. Modes like speech, text, GUI-based interaction using mouse or touchscreen devices are the most common. Among these, speech is one of the most intuitive and natural modes of communication. Embedded in speech is a very important aspect of the message one intends to convey-- emotions. For more natural HCI through speech, this emotional aspect needs to be incorporated in the machines. Computers should be able to both understand the emotion in speech conveyed by a human, as well as generate a speech in response that corresponds to both the message and the emotion identified. This requires machines to be able to recognize emotions from human speech.
My work focuses on this aspect, i.e., identifying emotions automatically from human speech. There have been many previous works in this area and this is still a hot topic of research. Many common signal processing techniques have been explored to extract features from emotional speech and many pattern recognition algorithms have also been tried out. However, the use of deep neural networks (DNN), which has the potential to virtually derive the features from raw speech itself, has not yet been adequately explored in this area. This is relatively a new paradigm and needs to be explored more. Therefore, I am trying to employ different types of DNN with various configurations to identify emotions from speech. Also, speech is not always noise-free. Therefore, the robustness of these techniques to noise will also be investigated. Apart from that, an analysis of some important emotions (like happiness, anger, sadness) will also be attempted to find out which features of speech contribute to these emotions the most.
Email : rjlhq05@gmail.com
Area of Research : Speech Processing
Research objective:
Email : priyagdarshi@gmail.com
Area of Research : Speech Processing
Research objective:
Video is a visual multimedia source that combines sequences of images to form moving objects. There are different types of videos like TV series, movies, music, educational, sports etc. Nowadays people have less time in hand and they want an abstract video, so researches on video summarization gain momentum. Different research work on video summarization is going on by academic researchers and industries. Video summarization has two kinds of approaches- static video summarization (or key-frame based approach) and dynamic video summarization (or video skimming). Static summarization approach develops a set of key-frames as a summary of the given video. Dynamic summarization approach develops a short video which contains key events of the given video. Static summarization identifies distinct frame and uses them in preparing summary. Dynamic summarization uses semantic contents like color, texture and motion.
My objective is to develop a model which will automatically generate a dynamic summary of any video. For developing this model; few steps are required like shot detection, detection of semantic connection between these shots based on video text and audio, evaluation of semantic importance of a shot in a video, making those shots as a part of video summary based on semantic importance and keeping semantic connection between them.
Email : soumya.majumdar92@gmail.com
Area of Research : Speech Processing
Research objective:
Email : aravindareddy.27@gmail.com
Area of Research : Speech Processing
Research objective:
Email : madhu.keerthu@gmail.com
Area of Research : Speech Processing
Research objective: Speech based communication is one of the natural and easiest communication between human beings. Though speech communication is natural, the devices that respond to the speech query were limited. Many researchers are working on Automatic Speech Recognition (ASR) and Speech Synthesis (TTS) systems to bridge the communication gap. Being a part of Speech processing group, my focus will be developing Text to Speech (TTS) systems for Indian languages (Low resource languages) that produce human intelligible speech against input text seamlessly.
Email : sudhakar.asp@gmail.com