Machine Learning (CS60050)

Spring semester 2019-20

Announcements


  • Revised deadline for Assignment 4 - May 03, 2020, end of day. This is a hard deadline.

  • Assignment 4 declared - deadline April 14 - see below. Since summer vacation has been declared by the institute, the deadline is postponed. Revised deadline will be notified later. Since the assignment has been declared already, you are advised to do the assignment now when you have more time. The revised deadline is likely to be for a shorter duration than the 2-week time allotted now.

  • We understand that some students may be working from home. Hence we are designing the assignments such that they do not need much internet connectivity or computing resources. Sufficient time is being given for the assignments. It is disconcerting that still there are requests for extending deadline for every assignment. If someone knows that his/her progress from home will be slow, it is his/her responsibility to start working on the assignments sufficiently early.

  • Assignment 3 declared - deadline March 29 - see below. Deadline extended to April 1 end of day. There will be no further extension.

  • CSE Moodle can be accessed via https://moodlecse.iitkgp.ac.in/moodle. If some student is not able to access the Moodle, assignment submissions can be mailed to the specific TA who is supposed to grade the submission of that student (as specified below).

  • A point about the slide narrations being posted -- during the narrations, I am going at a quick speed, in order to reduce the duration and keep the file sizes small (since I understand some students can have problems in downloading large files). Also many things that I would have explained on the blackboard are being hand-written and put in the slides. To my understanding, the material covered in a narration of duration 20-30 minutes would have been covered in a 1-hour lecture in physical class. Students are advised to spend the requisite time over the narrations and the slides, in order to assimilate the material well.

  • Due to suspension of physical classes as a result of COVID outbreak, lectures are being posted on this website. Specifically, narrations of slides are posted as Powerpoint Show files (.ppsx); you should be able to see the slides and hear the narration using MS Powerpoint or any equivalent software. Note that, on some platforms, you may have to do the slide transition manually, e.g., by pressing spacebar or page-down key. Lectures are broken into several parts, in order to keep the files relatively small in size. Also, slides (pdf) are given as usual.
    Questions about the slides can be mailed to Prof. S. Ghosh. Online doubt-clearing sessions can be arranged if there are too many doubts on some topic. As always, you are expected to read textbooks (see below) and relevant online materials, and not depend only on the slides.

  • Assignment 2 declared - deadline March 01 - see below. Solutions to be submitted via Moodle. Deadline extended to March 04 due to several requests. There will not be any further extension.

  • Assignment 1 declared - deadline Feb 02 - see below. Solutions to be submitted via Moodle.

  • Every registered student should create an account on Moodle submission system of CSE department. This system will be used for submission and grading of assignments. Go to this link and follow the link "Moodle" (bottom-left on page). Create a new account for yourself (unless you have an account already), giving username, password, email id. After creating an account, login to the system, and follow the link "Spring Semester (2019-20)". Choose the course "Machine Learning". Join this course as "Student"; use Student Enrolment Key: STUMLSG.

  • Every registered student should join the Google mailing list https://groups.google.com/forum/#!forum/machinelearning-iitkgp-2020. To join the mailing list, you need to login with your Google account (e.g., Gmail account) and then apply to join. Your application will be approved. All announcements about the course will be made on this website or through this mailing list.

  • First class was on Wednesday, January 8, 2020

  • 05-Jan-2020 -- All UG and PG slots are now filled up, and no further registration to this course is possible. There is no need to mail me about registration to the course. I will not be able to reply to individual mails.

Instructor

Saptarshi Ghosh (Contact: saptarshi @ cse . iitkgp . ac . in)

Teaching Assistants

  1. Paheli Bhattacharya (pahelibhattacharya @ gmail . com)
  2. Shalmoli Ghosh (shalmolighosh94 @ gmail . com)
  3. Anurag Roy (anu15roy @ gmail . com)
  4. Soham Poddar (sohampoddar26 @ gmail . com)
  5. Paramita Koley (paramita2000 @ gmail . com)
  6. Pranesh Santikellur (pra.net061 @ gmail . com)

Course Timings (3 lectures per week)

Wednesday 11:00 - 11:55
Thursday 12:00 - 12:55
Friday 08:00 - 08:55

Class venue: NR422 (Nalanda complex)


Pre-requisites

Probability and Statistics
Some knowledge of Linear Algebra
Algorithms
Programming knowledge necessary for assignments (in C/C++/Java/Python)

Course evaluation

Assignments: 40% (There will be 4-5 assignments that will involve programming in C/C++/Java/Python)

Mid-semester exam: 20%

End-semester exam: 40%


Topics (outline)

  1. Introduction: Basic principles, Applications, Challenges
  2. Supervised learning: Linear Regression (with one variable and multiple variables), Gradient Descent, Classification -- Logistic Regression, Decision Trees, Naive Bayes, Support Vector Machines, Artificial Neural Networks (Perceptrons, Multilayer networks, back-propagation)
  3. Unsupervised learning: Clustering (K-means, Hierarchical), Dimensionality reduction
  4. Ensemble learning: Bagging, boosting
  5. Theory of Generalization: In-sample and out-of-sample error, Bias and Variance analysis, Overfitting, Regularization, Introduction to VC inequality
  6. Advanced topics: Bias and fairness in Machine Learning

Text and Reference Literature

  1. Christopher M. Bishop. Pattern Recognition and Machine Learning (Springer)
  2. David Barber, Bayesian Reasoning and Machine Learning (Cambridge University Press). Online version available here.
  3. Tom Mitchell. Machine Learning (McGraw Hill)
  4. Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification (John Wiley & Sons)

Assignments

Assignment 1 (Linear regression): Question
Deadline: February 2, 2020, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-27: Shalmoli | 28-54: Anurag | 55-82: Paramita | 83-110: Pranesh
[Serial numbers according to the list of students given below]

Assignment 2 (Classification using Logistic Regression and Decision Trees): Question
Deadline: March 04, 2020, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-27: Pranesh | 28-54: Shalmoli | 55-82: Anurag | 83-109: Paramita
[Serial numbers according to the list of students given below]

Assignment 3 (Clustering and Dimensionality reduction): Question
Deadline (extended): April 01, 2020, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-27: Paramita | 28-54: Pranesh | 55-82: Shalmoli | 83-109: Anurag
[Serial numbers according to the list of students given below]

Assignment 4 (Neural Networks): Question
Deadline: May 03, 2020, 23:59 IST
Evaluation by TAs: Sl. no. 01-27: Anurag | 28-54: Paramita | 55-82: Pranesh | 83-109: Shalmoli
[Serial numbers according to the list of students given below]

List of students in the course: pdf

Slides

Topic Slides References / Comments
Introduction Slides Introduction to the course, utility of ML, applications of ML
Linear Regression Slides Linear regression in one variable and multiple variables, concept of cost function, gradient descent, polynomial regression
Logistic Regression Slides Binary classification; logistic regression; multi-class classification
Evaluation and Overfitting Slides Evaluation and error analysis; Bias and Variance; Overfitting, validation and regularization
Demo of ML tools Material Demonstration of ML tools on Feb 05: slides, datasets, sample scripts
Decision Trees Slides Classification using Decision Trees, Hunt's algorithm, Impurity measures (Gini index, entropy), overfitting and pruning a Decision Tree
Clustering (Part 1) Slides Prototype based clustering (K-means), hierarchical clustering, graph clustering, density-based clustering, K-means as an Expectation Maximization algorithm
Clustering (Part 2) No slides Maximum Likelihood Estimation, Soft clustering using mixture models
Dimensionality Reduction Slides Supervised and unsupervised techiniques for dimensionality reduction, Principal Component Analysis
Naive Bayes classifier Slides Bayesian learning, Naive Bayes classifier
Neural networks
(Parts: slide narration recordings as .ppsx files)
Part1
Part2
Part3
Part4
Part5
Slides(pdf)
Part 1: Linear models, Perceptrons, Non-linear models, Multilayer Perceptrons, Neurons
Part 2: History of neural networks, neural network architecture
Part 3: Neural network architecture (contd.), forward propagation
Part 4: Learning on neural networks: Stochastic Gradient Descent, Backpropagation algorithm
Part 5: Concluding discussion
Word embeddings: Application of Neural networks
(Parts: slide narration recordings as .ppsx files)
Part1
Part2
Part3
Part4
Slides(pdf)
Part 1: One-hot representations vs. word embeddings, intuitive idea of Word2vec2vec embeddings
Parts 2 and 3: Skip-gram neural network architecture and functionality
Part 4: Optimizations and extensions

Additional resources:
Original papers by Mikolov et al: paper 1 paper 2
A good tutorial
Support vector machines
(Parts: slide narration recordings as .ppsx files)
Part1
Part2
Part3
Slides(pdf)
Part 1: Concept of margin, how to compute the margin, formulating the margin maximization problem
Part 2: Solving the optimization problem, support vectors
Part 3: Non-linear transforms and kernel functions

Additional resources:
Online lectures by Prof. Abu-Mostafa, Caltech - Lectures 14, 15
Ensemble learning
(Parts: slide narration recordings as .ppsx files)
Part1
Part2
Part3
Slides(pdf)
Part 1: Motivation of ensemble learning, types of algorithms
Part 2: Bagging
Part 3: Boosting

Additional resources:
A tutorial
A Youtube video on AdaBoost
Advanced topics - I
Bias-variance trade-off
Narration(ppsx)
Slides(pdf)
Additional resources:
A tutorial
Advanced topics - II
Fairness in ML
(Parts: slide narration recordings as .ppsx files)
Part1
Part2
Part3
Slides(pdf)
Parts 1 and 2: Motivating need for fairness
Part 3: Fairness definitions; how ML models can be unfair; how to make ML models fair

Additional resources:
A tutorial
Advanced topics - III
Introduction to Theory of Generalization
Notes(pdf)
Additional resources:
A discussion
Online lectures by Prof. Abu-Mostafa, Caltech - Lectures 5, 6, 7 [These lectures by Prof. Mostafa contain advanced material, and are not part of this course]
Advanced topics - IV
Bayesian Networks
Lecture videos:
Lecture by Prof. Sudeshna Sarkar, IIT Kharagpur
Lecture by Prof. Bert Huang, Virginia Tech


Other interesting stuff

  1. 10 Things Everyone Should Know About Machine Learning - Daniel Tunkelang
  2. Ali Rahimi's Test-of-time award presentation at NIPS 2017 (comparing Machine Learning with Alchemy)
  3. Machine Learning resources
  4. Datasets for Machine Learning