Anshuman Tripathi

BACK

Gender Prediction of Indian Names.

Anshuman Tripathi, Manaal Faruqui

Department of Computer Science and Engineering, IIT Kharagpur

GENERAL : In this project we tried to identify features in the first names of Indian Origins, that help classify the names according to gender. We firstly formed a corpus of names with the corresponding gneder, by crawling webites that suggest baby names. Then we trained a SVM model on several features like vowel-ending, sonorance, length of name and n-gram suffix to classify a test set.

END RESULT : With a small training set of 2000 names we were able to get a F1 score of 95% on a test set of 220 names. Most of the names that didn't get classified correctly were those of punjabi origins, names that are used for both boys and girls like jasbinder .. etc.

STATUS : The work was presented and published in IEEE Students Technology Symposium, held at IIT Kharagpur in 2011.

FUTUTRE WORK:
  • The model may be extended to support names not constrained to Indian Origins
RESOURCES: RESULTS:
  • Based on our studies we found that in India, around 91% female names end with a vowel
  • Female names have, in general, more length and more number of syllables and sonorance characteristics.