Acquisition and incorporation of prosody »
Appropriate models are proposed for capturing the prosodic information, and signal processing
methods are developed to incorporate the prosodic information into speech.
Expressive speech processing »
Emotional speech databases were developed in Hindi and Telugu.
Implicit and explicit excitation source features, pitch synchronous and sub-syllabic spectral features
and multi-level global and local prosodic features are proposed for characterizing the emotions.
Hierarchical models are proposed for improving the accuracy of emotion recognition.
Acquisition and incorporation of emotion-specific knowledge for developing emotion-aware
Robust signal processing methods were proposed for detecting the speech events in expressive
Signal processing and machine learning methods were proposed for voice as well as expression
Speech/Speaker/Language Recognition »
Signal processing methods are developed to detect important speech events such as vowel onset
and offset points in presence of speech coding and background noise.
Efficient speech and speaker recognition systems are developed in mobile environment by
exploiting the crucial speech events and hybrid recognition models.
Articulatory and excitation source features are proposed for recognition of speech in read,
extempore and conversation modes.
Robust speaker recognition techniques were proposed based on stochastic feature compensation
and total variability speaker modeling frameworks.
Multi-SNR speaker models are proposed for speaker recognition in varying background
Emotion compensation techniques are proposed for Speaker recognition in emotional
Robust language recognition systems were developed using spectral features extracted from glottal
closure regions, multi-level prosodic features, implicit and explicit excitation source features.
Text-to-speech synthesis »
Syllable based Bengali text-to-speech system was developed.
Efficient text analysis and phase break prediction models are developed.
Accurate prosody models are developed using feedforward neural networks.
Appropriate syllable-specific features, unit-selection cost functions and weight selection criterion
Bengali screen reader was developed and demonstrated to visually challenged people at NAB
Kolkata by conducting workshop for 5 weeks.
Laughter synthesis and incorporation of appropriate laughter segments for generating the Happy
Storyteller style speech synthesis systems were developed in four Indian languages (Hindi, Telugu,
Bengali and Malayalam)
Sroty-specific prosody models were proposed for enhancing the storyteller style synthesized speech
Effective source models were proposed and integrated in statistical parametric speech synthesis for
generating both modal and creaky voices.
Voice conversion »
Autoassociative neural network models are proposed for mapping speaker-specific characteristics
between source and target speakers.
Analysis and synthesis of vocal folds activity »
Accurate and robust detection of significant instants within a glottal cycle using phase information
Accurate parameterization of vocal folds activity using phase information
Analysis and investigation of vocal disorders using the proposed parameters
Simulation of vocal folds activity using EGG and Speech signals.
Music signal processing »
Signal processing methods were proposed for extracting the predominant melody from polyphonic
Accurate melody extraction from singing voice
Automatic Note transcription
Big-data Framework for Audio and Multimedia applications »
Automatic document clustering using posterior features
Prototype system developed »
Text-to-Speech synthesis system for an Indian language Bengali
Bi-lingual (English and Bengali) screen reader for visually challenged people
Multi-stage storyteller style speech synthesizers in Hindi, Telugu, Bengali and Malayalam.
Vocalfold activity synthesizer is developed using phase information of EGG
Online Hindustani music tutoring system (basic SARGAM)
Automatic Tanpura tuner system
Title - Indian Institute of Technology Kharagpur - Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC)
Brief Description - This speech database consists of 27 Indian languages, 16 of them which are widely spoken were considered in this study for analyzing the language identification performance.
For more details IITKGP-SEHSC »
Title - Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC)
Brief Description - An emotional speech corpus (IITKGP-SEHSC) recorded in Hindi. The emotions considered for developing IITKGP-SEHSC are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise.
For more details IITKGP-SESC »
Title - Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC)
Brief Description - An emotional speech corpus (IITKGP-SESC) in Telugu. The basic emotions considered for developing IITKGP-SESC are Anger,Compassion, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise.
For more details IITKGP-SDUC »
Title - Indian Institute of Technology Kharagpur Speech Database for Unsupervised Clustering (IITKGP-SDUC)
Brief Description - This database is recorded with a single speaker. The database contains broad topics like politics, sports and weather.
Title - System and Method for Synchronizing Acoustic Signal of Vopiced Speech and its Corresponding Electroglottography Signal
Ref. No. - 805/KOL/2014 Patent 2»
Title - Method and apparatus to detect voice activity using Harmonics of Phase of Zero Frequency Filtered Speech Signal
Ref. No. - Ref : 1237/KOL/2015