Major Research Projects

Noise correction in Natural Language Sentences using a Language Model with Dr. Sudeshna Sarkar (CSE, IIT Kharagpur)(M. Tech dissertation, August 2010- present)

Details: The aim of this work is to devise methods to remove noise, especially structural errors from natural language sentences. For this, we are making use of a language model and applying a dynamic programming algorithm in order to find the best sequence of replacements that would give us a better sentence

Factored Language Models for Translation into Morphologically Complex Languages with Dr. Anoop Sarkar (Department of Computing Sciences, Simon Fraser University) (May-July 2010, Summer Internship)

Details: The aim of this work was to develop a C++ wrapper for implementing the Factored Language Model functions present in the SRILM language modeling toolkit and integrate it with the "Kriya" system being developed at SFU. Also, we implemented several Factored Language Models using morphology information and that resulted in an improvement in translation quality.

Extracting Bilingual Dictionary from Comparable Corpora with Dr. Sudeshna Sarkar (CSE, IIT Kharagpur)(B. Tech dissertation, August 2009- present)

Details: We are using a complex network based modeling of comparable corpora and using eigenvectors and word co-occurrence based approach to map the words in the two corpora. It turned out that eigenvector alignment is not a good method for this task as they fail to capture the inherent abmiguity present in languages. But the word co-occurrence based approach gave good results.

Understanding the Nature of Dorogovtsev-Mendes Model with Dr. Monojit Choudhury, Microsoft Research India, Bangalore. (May-July 2009, Summer Internship)

Details: In this work, we studied the nature of large linguistic networks and also analyzed the Dorogovtsev-Mendes Model (2001) in detail. The focus of the work was to point out the different shortcomings of the model. The work involved thorough background study and extensive simulations in C++ and Matlab.

Psycholinguistic Experiments to correlate phrase structure, performance and prosody in Bangla and Hindi with Kalika Bali, Monojit Choudhury, Sankalan Prasad and Arpit Maheshwari (Microsoft Research India) (June-July,2009)

Details: We repeated the chunking experiments performed by Grosjean and Lane in Hindi and Bangla and analyzed the results to determine the correlations between phrase structure, prosody and performance structures for these languages.

Music Information Retrieval System with Arka Aloke Bhattacharya, Pavan Nithin, Aniket Nayak under Dr. Pabitra Mitra and Dr. Arun Kumar Majumdar (CSE IIT Kharagpur) (February-April 2009).

Details: In this project, we developed a system for retrieving best-match music tracks and related metadata from a database based on an audio input. We used the windowed MFCC coefficients for characterizing the tracks and used a Dynamic Time-Warping Algorithm to retrieve the best-matching tracks based on the query audio signal. The coding was done in C++ and MATLAB.

Multi-Document opinion summarization with Sushant Kumar and Dr. Sudeshna Sarkar, (CSE IIT Kharagpur) .(May-August, 2008)

Details: In this project, we have considered a statistical approach to document summarization . We have used the lexrank method of sentence ranking but we have used a different metric/formula for ranking. We have not concentrated on Natural Language Generation , and have extracted sentences from the documents based on the ranking formulae used. The entire program was coded in Python and we used the NLTK extensively.



Term Projects

1. Design of a set of cognitive experiments with Dr. Anupam Basu (CSE, IIT Kharagpur)(Design Lab project, August-November, 2010)

Details:In this work, I designed and developed a GUI for performing a number of cognitive tests.

2. Developed a hotel management software , a game of snakes and ladders, a college student management system , and a multiple choice test taking system as part of the Software Engineering course. All the codes had GUI built using Java Swing.

3. Developed a Rule-based English-Hindi transliteration system as part of the Speech and NLP course. The system outputs had 91.6% match with the top 2 outputs provided by Google Indic Transliteration. I tested it over 3000 words.

4. Developed a 16-bit processor with controller as part of Computer Organization and Architecture Laboratory course.

5. Developed a Compiler for a subset of C language as part of Compilers laboratory course.