Appropriate models are proposed for capturing the prosodic information, and signal processing
methods are developed to incorporate the prosodic information into speech.
Emotional speech databases were developed in Hindi and Telugu.
Implicit and explicit excitation source features, pitch synchronous and sub-syllabic spectral features
and multi-level global and local prosodic features are proposed for characterizing the emotions.
Hierarchical models are proposed for improving the accuracy of emotion recognition.
Acquisition and incorporation of emotion-specific knowledge for developing emotion-aware
speech systems
Robust signal processing methods were proposed for detecting the speech events in expressive
speech.
Signal processing and machine learning methods were proposed for voice as well as expression
transformations.
Signal processing methods are developed to detect important speech events such as vowel onset
and offset points in presence of speech coding and background noise.
Efficient speech and speaker recognition systems are developed in mobile environment by
exploiting the crucial speech events and hybrid recognition models.
Articulatory and excitation source features are proposed for recognition of speech in read,
extempore and conversation modes.
Robust speaker recognition techniques were proposed based on stochastic feature compensation
and total variability speaker modeling frameworks.
Multi-SNR speaker models are proposed for speaker recognition in varying background
environments.
Emotion compensation techniques are proposed for Speaker recognition in emotional
environments.
Robust language recognition systems were developed using spectral features extracted from glottal
closure regions, multi-level prosodic features, implicit and explicit excitation source features.
Syllable based Bengali text-to-speech system was developed.
Efficient text analysis and phase break prediction models are developed.
Accurate prosody models are developed using feedforward neural networks.
Appropriate syllable-specific features, unit-selection cost functions and weight selection criterion
are proposed.
Bengali screen reader was developed and demonstrated to visually challenged people at NAB
Kolkata by conducting workshop for 5 weeks.
Laughter synthesis and incorporation of appropriate laughter segments for generating the Happy
emotion.
Storyteller style speech synthesis systems were developed in four Indian languages (Hindi, Telugu,
Bengali and Malayalam)
Sroty-specific prosody models were proposed for enhancing the storyteller style synthesized speech
quality.
Effective source models were proposed and integrated in statistical parametric speech synthesis for
generating both modal and creaky voices.
Autoassociative neural network models are proposed for mapping speaker-specific characteristics
between source and target speakers.
Accurate and robust detection of significant instants within a glottal cycle using phase information
Accurate parameterization of vocal folds activity using phase information
Analysis and investigation of vocal disorders using the proposed parameters
Simulation of vocal folds activity using EGG and Speech signals.
Signal processing methods were proposed for extracting the predominant melody from polyphonic
music
Accurate melody extraction from singing voice
Automatic Note transcription
Automatic document clustering using posterior features
Text-to-Speech synthesis system for an Indian language Bengali
Bi-lingual (English and Bengali) screen reader for visually challenged people
Multi-stage storyteller style speech synthesizers in Hindi, Telugu, Bengali and Malayalam.
Vocalfold activity synthesizer is developed using phase information of EGG
Online Hindustani music tutoring system (basic SARGAM)
Automatic Tanpura tuner system
Research Databases
Title - Indian Institute of Technology Kharagpur - Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC)
Brief Description - This speech database consists of 27 Indian languages, 16 of them which are widely spoken were considered in this study for analyzing the language identification performance.
Title - Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC)
Brief Description - An emotional speech corpus (IITKGP-SEHSC) recorded in Hindi. The emotions considered for developing IITKGP-SEHSC are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise.
Title - Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC)
Brief Description - An emotional speech corpus (IITKGP-SESC) in Telugu. The basic emotions considered for developing IITKGP-SESC are Anger,Compassion, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise.