Research

Research Contributions

  • Appropriate models are proposed for capturing the prosodic information, and signal processing methods are developed to incorporate the prosodic information into speech.
  • Emotional speech databases were developed in Hindi and Telugu.
  • Implicit and explicit excitation source features, pitch synchronous and sub-syllabic spectral features and multi-level global and local prosodic features are proposed for characterizing the emotions.
  • Hierarchical models are proposed for improving the accuracy of emotion recognition.
  • Acquisition and incorporation of emotion-specific knowledge for developing emotion-aware speech systems
  • Robust signal processing methods were proposed for detecting the speech events in expressive speech.
  • Signal processing and machine learning methods were proposed for voice as well as expression transformations.
  • Signal processing methods are developed to detect important speech events such as vowel onset and offset points in presence of speech coding and background noise.
  • Efficient speech and speaker recognition systems are developed in mobile environment by exploiting the crucial speech events and hybrid recognition models.
  • Articulatory and excitation source features are proposed for recognition of speech in read, extempore and conversation modes.
  • Robust speaker recognition techniques were proposed based on stochastic feature compensation and total variability speaker modeling frameworks.
  • Multi-SNR speaker models are proposed for speaker recognition in varying background environments.
  • Emotion compensation techniques are proposed for Speaker recognition in emotional environments.
  • Robust language recognition systems were developed using spectral features extracted from glottal closure regions, multi-level prosodic features, implicit and explicit excitation source features.
  • Syllable based Bengali text-to-speech system was developed.
  • Efficient text analysis and phase break prediction models are developed.
  • Accurate prosody models are developed using feedforward neural networks.
  • Appropriate syllable-specific features, unit-selection cost functions and weight selection criterion are proposed.
  • Bengali screen reader was developed and demonstrated to visually challenged people at NAB Kolkata by conducting workshop for 5 weeks.
  • Laughter synthesis and incorporation of appropriate laughter segments for generating the Happy emotion.
  • Storyteller style speech synthesis systems were developed in four Indian languages (Hindi, Telugu, Bengali and Malayalam)
  • Sroty-specific prosody models were proposed for enhancing the storyteller style synthesized speech quality.
  • Effective source models were proposed and integrated in statistical parametric speech synthesis for generating both modal and creaky voices.
  • Autoassociative neural network models are proposed for mapping speaker-specific characteristics between source and target speakers.
  • Accurate and robust detection of significant instants within a glottal cycle using phase information
  • Accurate parameterization of vocal folds activity using phase information
  • Analysis and investigation of vocal disorders using the proposed parameters
  • Simulation of vocal folds activity using EGG and Speech signals.
  • Signal processing methods were proposed for extracting the predominant melody from polyphonic music
  • Accurate melody extraction from singing voice
  • Automatic Note transcription
  • Automatic document clustering using posterior features
  • Text-to-Speech synthesis system for an Indian language Bengali
  • Bi-lingual (English and Bengali) screen reader for visually challenged people
  • Multi-stage storyteller style speech synthesizers in Hindi, Telugu, Bengali and Malayalam.
  • Vocalfold activity synthesizer is developed using phase information of EGG
  • Online Hindustani music tutoring system (basic SARGAM)
  • Automatic Tanpura tuner system

Research Databases

  • Title - Indian Institute of Technology Kharagpur - Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC)
  • Brief Description - This speech database consists of 27 Indian languages, 16 of them which are widely spoken were considered in this study for analyzing the language identification performance.
  • For more details
  • Title - Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC)
  • Brief Description - An emotional speech corpus (IITKGP-SEHSC) recorded in Hindi. The emotions considered for developing IITKGP-SEHSC are anger, disgust, fear, happy, neutral, sad, sarcastic and surprise.
  • For more details
  • Title - Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC)
  • Brief Description - An emotional speech corpus (IITKGP-SESC) in Telugu. The basic emotions considered for developing IITKGP-SESC are Anger,Compassion, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise.
  • For more details
  • Title - Indian Institute of Technology Kharagpur Speech Database for Unsupervised Clustering (IITKGP-SDUC)
  • Brief Description - This database is recorded with a single speaker. The database contains broad topics like politics, sports and weather.

Patents Filed

  • Title - System and Method for Synchronizing Acoustic Signal of Vopiced Speech and its Corresponding Electroglottography Signal
  • Ref. No. - 805/KOL/2014
  • Title - Method and apparatus to detect voice activity using Harmonics of Phase of Zero Frequency Filtered Speech Signal
  • Ref. No. - Ref : 1237/KOL/2015