Machine Learning (CS60050)

Spring semester 2018-19

Announcements

  • End-semester syllabus includes all topics taught in the course.

  • Assignment 4 declared - deadline April 19 - see below.

  • Assignment 3 declared - deadline March 31 - see below.

  • Assignment 2 declared - deadline March 15 - see below.

  • Assignment 1 declared - deadline Feb 15 - see below.

  • Every student should create an account on Moodle submission system of CSE department. This system will be used for submission and grading of assignments. Go to this link and follow the link "Moodle" (bottom-left on page). Create a new account for yourself (unless you have an account already), giving username, password, email id. After creating an account, login to the system, and follow the link "Spring Semester (2018-19)". Choose the course "Machine Learning". Join this course as "Student"; use Student Enrolment Key: STUML.

  • All registered students should join the mailing group https://groups.google.com/d/forum/machinelearning2019

Instructor

Saptarshi Ghosh (Contact: saptarshi @ cse . iitkgp . ac . in)

Teaching Assistants

  1. Abhisek Dash (assignmentad @ gmail . com)
  2. Paheli Bhattacharya (pahelibhattacharya @ gmail . com)
  3. Shalmoli Ghosh (shalmolighosh94 @ gmail . com)
  4. Ainuddin Khan (ainuddin.india @ gmail . com)
  5. Harish Yadav (harishyadav394 @ gmail . com)
  6. Midatala Surya (surya.midatala @ gmail . com)

Course Timings (3 lectures)

Wednesday 11:00 - 11:55
Thursday 12:00 - 12:55
Friday 08:00 - 08:55

Class venue: NR421 (Nalanda complex)


Course evaluation

Assignments: 40% (There will be 4-5 assignments that will involve programming in C/C++/Java/Python)

Mid-semester exam: 20%

End-semester exam: 40%


Topics (outline)

  1. Introduction: Basic principles, Applications, Challenges
  2. Supervised learning: Linear Regression (with one variable and multiple variables), Gradient Descent, Classification -- Logistic Regression, Decision Trees, Naive Bayes, Support Vector Machines, Artificial Neural Networks (Perceptrons, Multilayer networks, back-propagation)
  3. Unsupervised learning: Clustering (K-means, Hierarchical), Dimensionality reduction
  4. Ensemble learning: Bagging, boosting
  5. Theory of Generalization: In-sample and out-of-sample error, Bias and Variance analysis, Overfitting, Regularization, VC inequality, VC analysis,
  6. Advanced topics: Bias and fairness in Machine Learning

Text and Reference Literature

  1. Christopher M. Bishop. Pattern Recognition and Machine Learning (Springer)
  2. David Barber, Bayesian Reasoning and Machine Learning (Cambridge University Press). Online version available here.
  3. Tom Mitchell. Machine Learning (McGraw Hill)
  4. Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification (John Wiley & Sons)

Assignments

Assignment 1 (Linear regression):
Question
Deadline: February 15, 2019, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-31: Shalmoli | 32-62: Ainuddin | 63-92: Harish | 93-123: Surya
[Serial numbers according to the list of students given below]

Assignment 2 (Decision Trees):
Question
Deadline: March 15, 2019, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-31: Surya | 32-62: Shalmoli | 63-92: Ainuddin | 93-123: Harish
[Serial numbers according to the list of students given below]

Assignment 3 (Clustering):
Question
Deadline: March 31, 2019, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-31: Harish | 32-62: Surya | 63-92: Shalmoli | 93-123: Ainuddin
[Serial numbers according to the list of students given below]

Assignment 4 (Neural Networks):
Question
Deadline: April 19, 2019, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-31: Ainudding | 32-62: Harish | 63-92: Surya | 93-123: Shalmoli
[Serial numbers according to the list of students given below]

List of 123 students in the course: pdf

Slides

Topic Slides References / Comments
Introduction Slides Introduction to the course, utility of ML, applications of ML
Demo of ML tools Material Demonstration of ML tools (slides, datasets, sample scripts)
Linear Regression Slides Linear regression in one variable and multiple variables, concept of cost function, gradient descent, polynomial regression
Logistic Regression Slides Binary classification; logistic regression; multi-class classification
Evaluation and Overfitting Slides Evaluation and error analysis
Bias and Variance
Overfitting, validation and regularization
Decision Trees Slides Classification using Decision Trees, Hunt's algorithm, Impurity measures (Gini index, entropy), overfitting and pruning a Decision Tree
Fairness in Machine Learning Slides Fairness and bias, and how to deal with them
Unsupervised Learning: Clustering Slides Prototype based clustering, hierarchical clustering, graph clustering, density-based clustering
Dimensionality Reduction Slides Supervised and unsupervised ways of dimensionality reduction, Principal Component Analysis
Naive Bayes classifier Slides Bayesian classifiers, Naive Bayes
Neural networks Slides Perceptrons, Multilayer Perceptrons, Neural networks, backpropagation algorithm, Stochastic Gradient Descent
Support vector machines Slides Margin, margin optimization, Kernel methods
Ensemble Learning Slides Bagging, Boosting
Introduction to Theory of Generalization No slides Bounding the testing error, Breakpoints, Vapnik-Chervonenkis inequality, VC Dimension


Other interesting stuff

  1. 10 Things Everyone Should Know About Machine Learning - Daniel Tunkelang
  2. Ali Rahimi's Test-of-time award presentation at NIPS 2017 (comparing Machine Learning with Alchemy)
  3. Machine Learning resources
  4. Datasets for Machine Learning