Rishiraj Saha Roy Rishiraj Saha Roy

About me

I am a Ph.D. student (Microsoft Research India Ph.D. Fellow) in the Department of Computer Science and Engineering, IIT Kharagpur, since December 2009. My Ph.D. is jointly advised by Prof. Niloy Ganguly (IIT Kharagpur) and Dr. Monojit Choudhury (Microsoft Research India). I am a part of the Complex Networks Research Group (CNeRG) at IIT Kharagpur. Even though I am very passionate about search and my Ph.D. is on Web search query analysis, my general research interests include Information Retrieval, Text Mining, Machine Learning, Natural Language Processing, Complex Networks, and Linguistics. I joined Adobe Research Labs India, Bangalore, on 1st April 2014, as a Computer Scientist.

This page is no longer updated [last update: 13 December 2014]: My Google scholar profile My Adobe home page Contact

Research | Publications | Academics | Others | Contact

What's New?



My Ph.D. research focuses on various aspects of query analysis, applied to the domain of Web search. The thesis idea explores the proposition of search queries having evolved into a distinct language, resulting from continuous two-way interactions between users and the search engine. Our results quantify similarities and differences between queries and their parent natural language (English in our case). However, the focus has always been on solving practical open problems in information retrieval. Specifically, I have worked on flat and hierarchical query segmentation (algorithms and evaluation), intent analysis, language modelling, complex network modelling, and cognitive experiments conducted through crowdsourcing.

General philosophy

The co-evolution of the Web and commercial search engines, and the inability of such search engines to process natural language (NL) questions, have resulted in search queries being formulated in a syntax which is more complex than a bag-of-words model, but more flexibly structured than sentences conforming to NL grammar. In this thesis, we take the first steps to understand this unique syntactic structure of Web search queries in an unsupervised setup, and apply the acquired knowledge to make important contributions to Information Retrieval (IR). First, we develop a query segmentation algorithm that uses query logs to discover syntactic units in queries. We find that our algorithm detects several syntactic constructs that differ from NL phrases. We proceed to augment our method with Wikipedia titles for identifying long named entities. Next, we develop an IR-based evaluation framework for query segmentation which is superior to previously employed evaluation schemes against human annotations. Here, we show that substantial IR improvements are possible due to query segmentation. We then develop an algorithm that uses only query logs to generate a nested query segmentation, where segments can be embedded inside bigger segments. Importantly, we also devise a technique for directly applying nested segmentation to improve document ranking. Subsequently, we use segment co-occurrence statistics computed from query logs to find that query segments broadly fall into two classes - content and intent. While content units must match exactly in the documents, intent units can be used in more intelligent ways to improve the quality of search results. More generally, the relationship between content and intent segments within the query is vital to query understanding. Finally, we generate large volumes of artificial query logs constrained by n-gram model probabilities estimated from real query logs. We perform corpus-level and query-level comparisons of model-generated logs with the real query log based on complex network statistics and (crowdsourced) user intuition of real query syntax, respectively. The two approaches together provide us with a holistic view of the syntactic complexity of Web queries which is more complex than what $n$-grams can capture, but yet more predictable than NL.

PUBLICATIONS (LONG PAPERS) [Google Scholar Profile] [Back to top]










ACADEMICS [Back to top]

Curriculum vitae [as on 13 December 2014]


Internships and visits

Awards, Fellowships and Grants

Teaching Assistantships


My webpage as maintained by IIT Kharagpur.


CONTACT ME [Back to top]

View Rishiraj Saha Roy's profile on LinkedIn Follow @RishirajSahaRoy

Date modified: 13 December 2014

You are visitor number:

Web Site Hit Counters

since February 01, 2011.