Research

Appropriate models are proposed for capturing the prosodic information, and signal processing methods are developed to incorporate the prosodic information into speech.

Emotional speech databases were developed in Hindi and Telugu.
Implicit and explicit excitation source features, pitch synchronous and sub-syllabic spectral features and multi-level global and local prosodic features are proposed for characterizing the emotions.
Hierarchical models are proposed for improving the accuracy of emotion recognition.
Acquisition and incorporation of emotion-specific knowledge for developing emotion-aware speech systems
Robust signal processing methods were proposed for detecting the speech events in expressive speech.
Signal processing and machine learning methods were proposed for voice as well as expression transformations.

Signal processing methods are developed to detect important speech events such as vowel onset and offset points in presence of speech coding and background noise.
Efficient speech and speaker recognition systems are developed in mobile environment by exploiting the crucial speech events and hybrid recognition models.
Articulatory and excitation source features are proposed for recognition of speech in read, extempore and conversation modes.
Robust speaker recognition techniques were proposed based on stochastic feature compensation and total variability speaker modeling frameworks.
Multi-SNR speaker models are proposed for speaker recognition in varying background environments.
Emotion compensation techniques are proposed for Speaker recognition in emotional environments.
Robust language recognition systems were developed using spectral features extracted from glottal closure regions, multi-level prosodic features, implicit and explicit excitation source features.

Syllable based Bengali text-to-speech system was developed.
Efficient text analysis and phase break prediction models are developed.
Accurate prosody models are developed using feedforward neural networks.
Appropriate syllable-specific features, unit-selection cost functions and weight selection criterion are proposed.
Bengali screen reader was developed and demonstrated to visually challenged people at NAB Kolkata by conducting workshop for 5 weeks.
Laughter synthesis and incorporation of appropriate laughter segments for generating the Happy emotion.
Storyteller style speech synthesis systems were developed in four Indian languages (Hindi, Telugu, Bengali and Malayalam)
Sroty-specific prosody models were proposed for enhancing the storyteller style synthesized speech quality.
Effective source models were proposed and integrated in statistical parametric speech synthesis for generating both modal and creaky voices.

Autoassociative neural network models are proposed for mapping speaker-specific characteristics between source and target speakers.

Accurate and robust detection of significant instants within a glottal cycle using phase information
Accurate parameterization of vocal folds activity using phase information
Analysis and investigation of vocal disorders using the proposed parameters
Simulation of vocal folds activity using EGG and Speech signals.

Signal processing methods were proposed for extracting the predominant melody from polyphonic music
Accurate melody extraction from singing voice
Automatic Note transcription

Text-to-Speech synthesis system for an Indian language Bengali
Bi-lingual (English and Bengali) screen reader for visually challenged people
Multi-stage storyteller style speech synthesizers in Hindi, Telugu, Bengali and Malayalam.
Vocalfold activity synthesizer is developed using phase information of EGG
Online Hindustani music tutoring system (basic SARGAM)
Automatic Tanpura tuner system

Title - Indian Institute of Technology Kharagpur - Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC)
Brief Description - This speech database consists of 27 Indian languages, 16 of them which are widely spoken were considered in this study for analyzing the language identification performance.
For more details

Title - Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC)
Brief Description - An emotional speech corpus (IITKGP-SEHSC) recorded in Hindi. The emotions considered for developing IITKGP-SEHSC are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise.
For more details

Title - Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC)
Brief Description - An emotional speech corpus (IITKGP-SESC) in Telugu. The basic emotions considered for developing IITKGP-SESC are Anger,Compassion, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise.
For more details

Title - Indian Institute of Technology Kharagpur Speech Database for Unsupervised Clustering (IITKGP-SDUC)
Brief Description - This database is recorded with a single speaker. The database contains broad topics like politics, sports and weather.

Title - System and Method for Synchronizing Acoustic Signal of Vopiced Speech and its Corresponding Electroglottography Signal
Ref. No. - 805/KOL/2014

Title - Method and apparatus to detect voice activity using Harmonics of Phase of Zero Frequency Filtered Speech Signal
Ref. No. - Ref : 1237/KOL/2015