Information Retrieval (CS60092)

Autumn semester 2021-22

Announcements


  • Class Test 1 will be on Thursday, September 09, during the class hours. Syllabus -- whatever will be covered till September 03.

  • Every registered student should create an account on the Moodle system of CSE department. This system will be used for submission and grading of class tests and project. If you do not have an account already on the CSE department Moodle, create a new account for yourself following the procedure stated on the same webpage. Login to the system, and follow the link "Autumn Semester (2021-22)". Choose the course "CS60092_2021-22 Information Retrieval". Join this course as "Student"; use Student Enrolment Key: CSTU60092.

  • First class on Wednesday, August 11, at 12:00. Join the class Information-Retrieval-2021A on MS Teams (IITKGP domain) using the Team Code b2gvupk

  • Every registered student should join the Google mailing list ir2021a@googlegroups.com. You need to login with your Google account (e.g., Gmail account) and use this link: https://groups.google.com/u/1/g/ir2021a. All announcements about the course will henceforth be made on this website or via the Teams class or through this mailing list.

  • The course requires a good knowledge of algorithms and data structures, probability and statistics, and knowledge of the basics of Natural Language Processing, Machine Learning and graph algorithms. This is a research-oriented course that would require students to understand several CS research papers. There will be a term project that needs to be done using Python/Java. It is advisable to take this course only if you have the necessary background.


Instructor

Saptarshi Ghosh (Email: saptarshi @ cse . iitkgp . ac . in)

Teaching Assistants

  • Paheli Bhattacharya (paheli . cse . iitkgp @ gmail . com)
  • Abhisek Dash (assignmentad @ gmail . com)
  • Rajdeep Mukherjee (rajdeep1989 . iitkgp @ gmail . com)
  • Paramita Koley (paramita2000 @ gmail . com)

Class Timings and Mode of Teaching

Class timings:
  • Wednesday 12:00--12:55
  • Thursday 11:00--11:55
  • Friday 09:00--09:55

Classes will either be online over MS Teams (IITKGP domain), or pre-recorded videos will be uploaded. Students are required to join the class Information-Retrieval-2021A on MS Teams (IITKGP domain) using the Team Code b2gvupk.

Rules for students while attending online classes:
  • Please keep your video off and microphone muted, unless specifically asked to do otherwise.
  • Only the teacher/designated presenter will share his/her screen and present.
  • If you have any question about the topic being taught, type your question in the CHAT. Please do NOT start talking suddenly. The teacher/presenter will intermittently answer the questions typed in the chat.
  • If a question requires further discussion, then the teacher/presenter will ask the corresponding student (who wrote that question) to speak. Only then should the corresponding student unmute himself/herself and speak. After the discussion, please mute yourself again.
  • In case the teacher/presenter gets disconnected, please wait for at least 10 minutes for the teacher/presenter to re-connect.
I shall attempt to record the classes taken via MS Teams, and make the recordings available on MS Teams itself. Note that class recording vidoes on MS Teams expire in 20 days (w.r.t. the date they are created); hence videos need to be downloaded before they expire. Recordings may not be available for some of the lectures, e.g., in case of technical snag, or if I forget to record some session. Availability of recordings is not a valid reason for not attending regular classes. In case you miss a class and its recording is not available for any reason, I would not be able to do anything about this.

Pre-requisites for the course

  • Data structures and algorithms
  • Probability and Statistics
  • Basics of Machine Learning
  • Basics of Natural Language Processing
  • Basics of Graph algorithms
  • Programming in Python/Java (there will be a programming-based term project)


Broad topics

  1. Boolean retrieval
  2. The term vocabulary & postings lists
  3. Dictionaries and tolerant retrieval
  4. Index construction and compression
  5. Scoring, term weighting & the vector space model
  6. Computing scores in a complete search system
  7. Evaluation in information retrieval
  8. Relevance feedback & query expansion
  9. Probabilistic information retrieval
  10. Language models for information retrieval
  11. Web Search and Applications such as Query Auto-completion
  12. Link analysis -- HITS, PageRank
  13. Summarization
  14. Learning to Rank
  15. Neural IR
  16. Domain-specific IR


Text and Reference Literature

  1. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, Cambridge: Cambridge university press.
  2. Research papers and reading materials to be pointed out in class

Course evaluation [tentative]

  • Three (3) online tests: 60% (20% weightage for each test)
  • One Term project: 40%
Note: The evaluation plan is subject to change based on institute/department policies and other related factors.

Every test should be attempted individually by each student. Plagiarism in any form -- copying from other students or from online resources -- will be severely penalized.