This webpage contains the development and test sets, originally a part of the query labelling subtask in the FIRE 2013 Track on Transliterated Search.
All the data available on this website must be used for non-commercial and research purposes only.
We release 500 labelled queries for English-Hindi in the development set. These contain 1056 distinct word transliteration pairs. Due to the small size of the data, it is recommended to use this, not as a part of training algorithms, but rather as a development set for tuning model parameters and understanding and analyzing word transliteration pairs. Additionally, we also release 406 queries as a test set, language labelled at the word level.
The aforementioned development set can be found here. An example labelled query is shown below. Here, each word is language tagged ('H' : Hindi; 'E' : English), and each Hindi word is transliterated to the Devanagari script as well.
banarasi\H=बनारसी silk\E sarees\E
The aforementioned test set can be found here. An example labelled query is shown below. Here, each word is language tagged ('hi' : Hindi; 'en' : English; 'NE*' : Named Entity).
bharat\hi ka\hi bharosa\hi dravid\NE zimbabwe\NE tour\en