Machine Learning

Machine Learning (CS60050)

Spring semester 2019-20

Announcements

Revised deadline for Assignment 4 - May 03, 2020, end of day. This is a hard deadline.

Assignment 4 declared - deadline April 14 - see below. Since summer vacation has been declared by the institute, the deadline is postponed. Revised deadline will be notified later. Since the assignment has been declared already, you are advised to do the assignment now when you have more time. The revised deadline is likely to be for a shorter duration than the 2-week time allotted now.

We understand that some students may be working from home. Hence we are designing the assignments such that they do not need much internet connectivity or computing resources. Sufficient time is being given for the assignments. It is disconcerting that still there are requests for extending deadline for every assignment. If someone knows that his/her progress from home will be slow, it is his/her responsibility to start working on the assignments sufficiently early.

Assignment 3 declared - deadline March 29 - see below. Deadline extended to April 1 end of day. There will be no further extension.

CSE Moodle can be accessed via https://moodlecse.iitkgp.ac.in/moodle. If some student is not able to access the Moodle, assignment submissions can be mailed to the specific TA who is supposed to grade the submission of that student (as specified below).

A point about the slide narrations being posted -- during the narrations, I am going at a quick speed, in order to reduce the duration and keep the file sizes small (since I understand some students can have problems in downloading large files). Also many things that I would have explained on the blackboard are being hand-written and put in the slides. To my understanding, the material covered in a narration of duration 20-30 minutes would have been covered in a 1-hour lecture in physical class. Students are advised to spend the requisite time over the narrations and the slides, in order to assimilate the material well.

Due to suspension of physical classes as a result of COVID outbreak, lectures are being posted on this website. Specifically, narrations of slides are posted as Powerpoint Show files (.ppsx); you should be able to see the slides and hear the narration using MS Powerpoint or any equivalent software. Note that, on some platforms, you may have to do the slide transition manually, e.g., by pressing spacebar or page-down key. Lectures are broken into several parts, in order to keep the files relatively small in size. Also, slides (pdf) are given as usual.
Questions about the slides can be mailed to Prof. S. Ghosh. Online doubt-clearing sessions can be arranged if there are too many doubts on some topic. As always, you are expected to read textbooks (see below) and relevant online materials, and not depend only on the slides.

Assignment 2 declared - deadline March 01 - see below. Solutions to be submitted via Moodle. Deadline extended to March 04 due to several requests. There will not be any further extension.

Assignment 1 declared - deadline Feb 02 - see below. Solutions to be submitted via Moodle.

Every registered student should create an account on Moodle submission system of CSE department. This system will be used for submission and grading of assignments. Go to this link and follow the link "Moodle" (bottom-left on page). Create a new account for yourself (unless you have an account already), giving username, password, email id. After creating an account, login to the system, and follow the link "Spring Semester (2019-20)". Choose the course "Machine Learning". Join this course as "Student"; use Student Enrolment Key: STUMLSG.

Every registered student should join the Google mailing list https://groups.google.com/forum/#!forum/machinelearning-iitkgp-2020. To join the mailing list, you need to login with your Google account (e.g., Gmail account) and then apply to join. Your application will be approved. All announcements about the course will be made on this website or through this mailing list.

First class was on Wednesday, January 8, 2020

05-Jan-2020 -- All UG and PG slots are now filled up, and no further registration to this course is possible. There is no need to mail me about registration to the course. I will not be able to reply to individual mails.

Instructor

Saptarshi Ghosh (Contact: saptarshi @ cse . iitkgp . ac . in)

Teaching Assistants

Paheli Bhattacharya (pahelibhattacharya @ gmail . com)
Shalmoli Ghosh (shalmolighosh94 @ gmail . com)
Anurag Roy (anu15roy @ gmail . com)
Soham Poddar (sohampoddar26 @ gmail . com)
Paramita Koley (paramita2000 @ gmail . com)
Pranesh Santikellur (pra.net061 @ gmail . com)

Course Timings (3 lectures per week)

Wednesday 11:00 - 11:55
Thursday 12:00 - 12:55
Friday 08:00 - 08:55

Class venue: NR422 (Nalanda complex)

Pre-requisites

Probability and Statistics
Some knowledge of Linear Algebra
Algorithms
Programming knowledge necessary for assignments (in C/C++/Java/Python)

Course evaluation

Assignments: 40% (There will be 4-5 assignments that will involve programming in C/C++/Java/Python)

Mid-semester exam: 20%

End-semester exam: 40%

Topics (outline)

Introduction: Basic principles, Applications, Challenges
Supervised learning: Linear Regression (with one variable and multiple variables), Gradient Descent, Classification -- Logistic Regression, Decision Trees, Naive Bayes, Support Vector Machines, Artificial Neural Networks (Perceptrons, Multilayer networks, back-propagation)
Unsupervised learning: Clustering (K-means, Hierarchical), Dimensionality reduction
Ensemble learning: Bagging, boosting
Theory of Generalization: In-sample and out-of-sample error, Bias and Variance analysis, Overfitting, Regularization, Introduction to VC inequality
Advanced topics: Bias and fairness in Machine Learning

Text and Reference Literature

Christopher M. Bishop. Pattern Recognition and Machine Learning (Springer)
David Barber, Bayesian Reasoning and Machine Learning (Cambridge University Press). Online version available here.
Tom Mitchell. Machine Learning (McGraw Hill)
Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification (John Wiley & Sons)

Assignments

Assignment 1 (Linear regression): Question
Deadline: February 2, 2020, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-27: Shalmoli | 28-54: Anurag | 55-82: Paramita | 83-110: Pranesh
[Serial numbers according to the list of students given below]

Assignment 2 (Classification using Logistic Regression and Decision Trees): Question
Deadline: March 04, 2020, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-27: Pranesh | 28-54: Shalmoli | 55-82: Anurag | 83-109: Paramita
[Serial numbers according to the list of students given below]

Assignment 3 (Clustering and Dimensionality reduction): Question
Deadline (extended): April 01, 2020, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-27: Paramita | 28-54: Pranesh | 55-82: Shalmoli | 83-109: Anurag
[Serial numbers according to the list of students given below]

Assignment 4 (Neural Networks): Question
Deadline: May 03, 2020, 23:59 IST
Evaluation by TAs: Sl. no. 01-27: Anurag | 28-54: Paramita | 55-82: Pranesh | 83-109: Shalmoli
[Serial numbers according to the list of students given below]

List of students in the course: pdf

Slides


Topic	Slides	References / Comments
Introduction	Slides	Introduction to the course, utility of ML, applications of ML
Linear Regression	Slides	Linear regression in one variable and multiple variables, concept of cost function, gradient descent, polynomial regression
Logistic Regression	Slides	Binary classification; logistic regression; multi-class classification
Evaluation and Overfitting	Slides	Evaluation and error analysis; Bias and Variance; Overfitting, validation and regularization
Demo of ML tools	Material	Demonstration of ML tools on Feb 05: slides, datasets, sample scripts
Decision Trees	Slides	Classification using Decision Trees, Hunt's algorithm, Impurity measures (Gini index, entropy), overfitting and pruning a Decision Tree
Clustering (Part 1)	Slides	Prototype based clustering (K-means), hierarchical clustering, graph clustering, density-based clustering, K-means as an Expectation Maximization algorithm
Clustering (Part 2)	No slides	Maximum Likelihood Estimation, Soft clustering using mixture models
Dimensionality Reduction	Slides	Supervised and unsupervised techiniques for dimensionality reduction, Principal Component Analysis
Naive Bayes classifier	Slides	Bayesian learning, Naive Bayes classifier
Neural networks (Parts: slide narration recordings as .ppsx files)	Part1 Part2 Part3 Part4 Part5 Slides(pdf)	Part 1: Linear models, Perceptrons, Non-linear models, Multilayer Perceptrons, Neurons Part 2: History of neural networks, neural network architecture Part 3: Neural network architecture (contd.), forward propagation Part 4: Learning on neural networks: Stochastic Gradient Descent, Backpropagation algorithm Part 5: Concluding discussion
Word embeddings: Application of Neural networks (Parts: slide narration recordings as .ppsx files)	Part1 Part2 Part3 Part4 Slides(pdf)	Part 1: One-hot representations vs. word embeddings, intuitive idea of Word2vec2vec embeddings Parts 2 and 3: Skip-gram neural network architecture and functionality Part 4: Optimizations and extensions Additional resources: Original papers by Mikolov et al: paper 1 paper 2 A good tutorial
Support vector machines (Parts: slide narration recordings as .ppsx files)	Part1 Part2 Part3 Slides(pdf)	Part 1: Concept of margin, how to compute the margin, formulating the margin maximization problem Part 2: Solving the optimization problem, support vectors Part 3: Non-linear transforms and kernel functions Additional resources: Online lectures by Prof. Abu-Mostafa, Caltech - Lectures 14, 15
Ensemble learning (Parts: slide narration recordings as .ppsx files)	Part1 Part2 Part3 Slides(pdf)	Part 1: Motivation of ensemble learning, types of algorithms Part 2: Bagging Part 3: Boosting Additional resources: A tutorial A Youtube video on AdaBoost
Advanced topics - I Bias-variance trade-off	Narration(ppsx) Slides(pdf)	Additional resources: A tutorial
Advanced topics - II Fairness in ML (Parts: slide narration recordings as .ppsx files)	Part1 Part2 Part3 Slides(pdf)	Parts 1 and 2: Motivating need for fairness Part 3: Fairness definitions; how ML models can be unfair; how to make ML models fair Additional resources: A tutorial
Advanced topics - III Introduction to Theory of Generalization	Notes(pdf)	Additional resources: A discussion Online lectures by Prof. Abu-Mostafa, Caltech - Lectures 5, 6, 7 [These lectures by Prof. Mostafa contain advanced material, and are not part of this course]
Advanced topics - IV Bayesian Networks		Lecture videos: Lecture by Prof. Sudeshna Sarkar, IIT Kharagpur Lecture by Prof. Bert Huang, Virginia Tech