Machine Learning (CS60050)
Spring semester 201920
Announcements
 Revised deadline for Assignment 4  May 03, 2020, end of day. This is a hard deadline.
 Assignment 4 declared  deadline April 14  see below. Since summer vacation has been declared by the institute, the deadline is postponed. Revised deadline will be notified later. Since the assignment has been declared already, you are advised to do the assignment now when you have more time. The revised deadline is likely to be for a shorter duration than the 2week time allotted now.

We understand that some students may be working from home. Hence we are designing the assignments such that they do not need much internet connectivity or computing resources. Sufficient time is being given for the assignments. It is disconcerting that still there are requests for extending deadline for every assignment. If someone knows that his/her progress from home will be slow, it is his/her responsibility to start working on the assignments sufficiently early.
 Assignment 3 declared  deadline March 29  see below. Deadline extended to April 1 end of day. There will be no further extension.
 CSE Moodle can be accessed via https://moodlecse.iitkgp.ac.in/moodle. If some student is not able to access the Moodle, assignment submissions can be mailed to the specific TA who is supposed to grade the submission of that student (as specified below).
 A point about the slide narrations being posted  during the narrations, I am going at a quick speed, in order to reduce the duration and keep the file sizes small (since I understand some students can have problems in downloading large files). Also many things that I would have explained on the blackboard are being handwritten and put in the slides. To my understanding, the material covered in a narration of duration 2030 minutes would have been covered in a 1hour lecture in physical class. Students are advised to spend the requisite time over the narrations and the slides, in order to assimilate the material well.

Due to suspension of physical classes as a result of COVID outbreak, lectures are being posted on this website. Specifically, narrations of slides are posted as Powerpoint Show files (.ppsx); you should be able to see the slides and hear the narration using MS Powerpoint or any equivalent software. Note that, on some platforms, you may have to do the slide transition manually, e.g., by pressing spacebar or pagedown key.
Lectures are broken into several parts, in order to keep the files relatively small in size. Also, slides (pdf) are given as usual.
Questions about the slides can be mailed to Prof. S. Ghosh. Online doubtclearing sessions can be arranged if there are too many doubts on some topic. As always, you are expected to read textbooks (see below) and relevant online materials, and not depend only on the slides.
 Assignment 2 declared  deadline March 01  see below. Solutions to be submitted via Moodle. Deadline extended to March 04 due to several requests. There will not be any further extension.
 Assignment 1 declared  deadline Feb 02  see below. Solutions to be submitted via Moodle.
 Every registered student should create an account on Moodle submission system of CSE department. This system will be used for submission and grading of assignments. Go to this link and follow the link "Moodle" (bottomleft on page). Create a new account for yourself (unless you have an account already), giving username, password, email id. After creating an account, login to the system, and follow the link "Spring Semester (201920)". Choose the course "Machine Learning". Join this course as "Student"; use Student Enrolment Key: STUMLSG.
 Every registered student should join the Google mailing list https://groups.google.com/forum/#!forum/machinelearningiitkgp2020. To join the mailing list, you need to login with your Google account (e.g., Gmail account) and then apply to join. Your application will be approved. All announcements about the course will be made on this website or through this mailing list.
 First class was on Wednesday, January 8, 2020
 05Jan2020  All UG and PG slots are now filled up, and no further registration to this course is possible. There is no need to mail me about registration to the course. I will not be able to reply to individual mails.
Instructor
Saptarshi Ghosh
(Contact: saptarshi @ cse . iitkgp . ac . in)
Teaching Assistants
 Paheli Bhattacharya (pahelibhattacharya @ gmail . com)
 Shalmoli Ghosh (shalmolighosh94 @ gmail . com)
 Anurag Roy (anu15roy @ gmail . com)
 Soham Poddar (sohampoddar26 @ gmail . com)
 Paramita Koley (paramita2000 @ gmail . com)
 Pranesh Santikellur (pra.net061 @ gmail . com)
Course Timings (3 lectures per week)
Wednesday 11:00  11:55
Thursday 12:00  12:55
Friday 08:00  08:55
Class venue: NR422 (Nalanda complex)
Prerequisites
Probability and Statistics
Some knowledge of Linear Algebra
Algorithms
Programming knowledge necessary for assignments (in C/C++/Java/Python)
Course evaluation
Assignments: 40% (There will be 45 assignments that will involve programming in C/C++/Java/Python)
Midsemester exam: 20%
Endsemester exam: 40%
Topics (outline)
 Introduction: Basic principles, Applications, Challenges
 Supervised learning: Linear Regression (with one variable and multiple variables), Gradient Descent, Classification  Logistic Regression, Decision Trees, Naive Bayes, Support Vector Machines, Artificial Neural Networks (Perceptrons, Multilayer networks, backpropagation)
 Unsupervised learning: Clustering (Kmeans, Hierarchical), Dimensionality reduction
 Ensemble learning: Bagging, boosting
 Theory of Generalization: Insample and outofsample error, Bias and Variance analysis, Overfitting, Regularization, Introduction to VC inequality
 Advanced topics: Bias and fairness in Machine Learning
Text and Reference Literature
 Christopher M. Bishop. Pattern Recognition and Machine Learning (Springer)
 David Barber, Bayesian Reasoning and Machine Learning (Cambridge University Press). Online version available here.
 Tom Mitchell. Machine Learning (McGraw Hill)
 Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification (John Wiley & Sons)
Assignments
Assignment 1 (Linear regression): Question
Deadline: February 2, 2020, 23:59 pm IST
Evaluation by TAs: Sl. no. 0127: Shalmoli  2854: Anurag  5582: Paramita  83110: Pranesh
[Serial numbers according to the list of students given below]
Assignment 2 (Classification using Logistic Regression and Decision Trees): Question
Deadline: March 04, 2020, 23:59 pm IST
Evaluation by TAs: Sl. no. 0127: Pranesh  2854: Shalmoli  5582: Anurag  83109: Paramita
[Serial numbers according to the list of students given below]
Assignment 3 (Clustering and Dimensionality reduction): Question
Deadline (extended): April 01, 2020, 23:59 pm IST
Evaluation by TAs: Sl. no. 0127: Paramita  2854: Pranesh  5582: Shalmoli  83109: Anurag
[Serial numbers according to the list of students given below]
Assignment 4 (Neural Networks): Question
Deadline: May 03, 2020, 23:59 IST
Evaluation by TAs: Sl. no. 0127: Anurag  2854: Paramita  5582: Pranesh  83109: Shalmoli
[Serial numbers according to the list of students given below]
List of students in the course: pdf
Slides
Topic 
Slides 
References / Comments 
Introduction 
Slides 
Introduction to the course, utility of ML, applications of ML 
Linear Regression 
Slides 
Linear regression in one variable and multiple variables, concept of cost function, gradient descent, polynomial regression 
Logistic Regression 
Slides 
Binary classification; logistic regression; multiclass classification 
Evaluation and Overfitting 
Slides 
Evaluation and error analysis; Bias and Variance; Overfitting, validation and regularization 
Demo of ML tools 
Material 
Demonstration of ML tools on Feb 05: slides, datasets, sample scripts 
Decision Trees 
Slides 
Classification using Decision Trees, Hunt's algorithm, Impurity measures (Gini index, entropy), overfitting and pruning a Decision Tree 
Clustering (Part 1) 
Slides 
Prototype based clustering (Kmeans), hierarchical clustering, graph clustering, densitybased clustering, Kmeans as an Expectation Maximization algorithm 
Clustering (Part 2) 
No slides 
Maximum Likelihood Estimation, Soft clustering using mixture models 
Dimensionality Reduction 
Slides 
Supervised and unsupervised techiniques for dimensionality reduction, Principal Component Analysis 
Naive Bayes classifier 
Slides 
Bayesian learning, Naive Bayes classifier 
Neural networks (Parts: slide narration recordings as .ppsx files) 
Part1
Part2
Part3
Part4
Part5
Slides(pdf)

Part 1: Linear models, Perceptrons, Nonlinear models, Multilayer Perceptrons, Neurons
Part 2: History of neural networks, neural network architecture
Part 3: Neural network architecture (contd.), forward propagation
Part 4: Learning on neural networks: Stochastic Gradient Descent, Backpropagation algorithm
Part 5: Concluding discussion

Word embeddings: Application of Neural networks (Parts: slide narration recordings as .ppsx files) 
Part1
Part2
Part3
Part4
Slides(pdf)

Part 1: Onehot representations vs. word embeddings, intuitive idea of Word2vec2vec embeddings
Parts 2 and 3: Skipgram neural network architecture and functionality
Part 4: Optimizations and extensions
Additional resources:
Original papers by Mikolov et al:
paper 1
paper 2
A good tutorial

Support vector machines (Parts: slide narration recordings as .ppsx files) 
Part1
Part2
Part3
Slides(pdf)

Part 1: Concept of margin, how to compute the margin, formulating the margin maximization problem
Part 2: Solving the optimization problem, support vectors
Part 3: Nonlinear transforms and kernel functions
Additional resources:
Online lectures by Prof. AbuMostafa, Caltech  Lectures 14, 15

Ensemble learning (Parts: slide narration recordings as .ppsx files) 
Part1
Part2
Part3
Slides(pdf)

Part 1: Motivation of ensemble learning, types of algorithms
Part 2: Bagging
Part 3: Boosting
Additional resources:
A tutorial
A Youtube video on AdaBoost

Advanced topics  I Biasvariance tradeoff 
Narration(ppsx)
Slides(pdf)

Additional resources:
A tutorial

Advanced topics  II Fairness in ML (Parts: slide narration recordings as .ppsx files) 
Part1
Part2
Part3
Slides(pdf)

Parts 1 and 2: Motivating need for fairness
Part 3: Fairness definitions; how ML models can be unfair; how to make ML models fair
Additional resources:
A tutorial

Advanced topics  III Introduction to Theory of Generalization 
Notes(pdf)

Additional resources:
A discussion
Online lectures by Prof. AbuMostafa, Caltech  Lectures 5, 6, 7 [These lectures by Prof. Mostafa contain advanced material, and are not part of this course]

Advanced topics  IV Bayesian Networks 

Lecture videos:
Lecture by Prof. Sudeshna Sarkar, IIT Kharagpur
Lecture by Prof. Bert Huang, Virginia Tech

Other interesting stuff
 10 Things Everyone Should Know About Machine Learning  Daniel Tunkelang
 Ali Rahimi's Testoftime award presentation at NIPS 2017 (comparing Machine Learning with Alchemy)
 Machine Learning resources
 Datasets for Machine Learning
