Machine Learning

Machine Learning (CS60050)

Spring semester 2018-19

Announcements

End-semester syllabus includes all topics taught in the course.

Assignment 4 declared - deadline April 19 - see below.

Assignment 3 declared - deadline March 31 - see below.

Assignment 2 declared - deadline March 15 - see below.

Assignment 1 declared - deadline Feb 15 - see below.

Every student should create an account on Moodle submission system of CSE department. This system will be used for submission and grading of assignments. Go to this link and follow the link "Moodle" (bottom-left on page). Create a new account for yourself (unless you have an account already), giving username, password, email id. After creating an account, login to the system, and follow the link "Spring Semester (2018-19)". Choose the course "Machine Learning". Join this course as "Student"; use Student Enrolment Key: STUML.

All registered students should join the mailing group https://groups.google.com/d/forum/machinelearning2019

Instructor

Saptarshi Ghosh (Contact: saptarshi @ cse . iitkgp . ac . in)

Teaching Assistants

Abhisek Dash (assignmentad @ gmail . com)
Paheli Bhattacharya (pahelibhattacharya @ gmail . com)
Shalmoli Ghosh (shalmolighosh94 @ gmail . com)
Ainuddin Khan (ainuddin.india @ gmail . com)
Harish Yadav (harishyadav394 @ gmail . com)
Midatala Surya (surya.midatala @ gmail . com)

Course Timings (3 lectures)

Wednesday 11:00 - 11:55
Thursday 12:00 - 12:55
Friday 08:00 - 08:55

Class venue: NR421 (Nalanda complex)

Course evaluation

Assignments: 40% (There will be 4-5 assignments that will involve programming in C/C++/Java/Python)

Mid-semester exam: 20%

End-semester exam: 40%

Topics (outline)

Introduction: Basic principles, Applications, Challenges
Supervised learning: Linear Regression (with one variable and multiple variables), Gradient Descent, Classification -- Logistic Regression, Decision Trees, Naive Bayes, Support Vector Machines, Artificial Neural Networks (Perceptrons, Multilayer networks, back-propagation)
Unsupervised learning: Clustering (K-means, Hierarchical), Dimensionality reduction
Ensemble learning: Bagging, boosting
Theory of Generalization: In-sample and out-of-sample error, Bias and Variance analysis, Overfitting, Regularization, VC inequality, VC analysis,
Advanced topics: Bias and fairness in Machine Learning

Text and Reference Literature

Christopher M. Bishop. Pattern Recognition and Machine Learning (Springer)
David Barber, Bayesian Reasoning and Machine Learning (Cambridge University Press). Online version available here.
Tom Mitchell. Machine Learning (McGraw Hill)
Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification (John Wiley & Sons)

Assignments

Assignment 1 (Linear regression):
Question
Deadline: February 15, 2019, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-31: Shalmoli | 32-62: Ainuddin | 63-92: Harish | 93-123: Surya
[Serial numbers according to the list of students given below]

Assignment 2 (Decision Trees):
Question
Deadline: March 15, 2019, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-31: Surya | 32-62: Shalmoli | 63-92: Ainuddin | 93-123: Harish
[Serial numbers according to the list of students given below]

Assignment 3 (Clustering):
Question
Deadline: March 31, 2019, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-31: Harish | 32-62: Surya | 63-92: Shalmoli | 93-123: Ainuddin
[Serial numbers according to the list of students given below]

Assignment 4 (Neural Networks):
Question
Deadline: April 19, 2019, 23:59 pm IST
Evaluation by TAs: Sl. no. 01-31: Ainudding | 32-62: Harish | 63-92: Surya | 93-123: Shalmoli
[Serial numbers according to the list of students given below]

List of 123 students in the course: pdf

Slides


Topic	Slides	References / Comments
Introduction	Slides	Introduction to the course, utility of ML, applications of ML
Demo of ML tools	Material	Demonstration of ML tools (slides, datasets, sample scripts)
Linear Regression	Slides	Linear regression in one variable and multiple variables, concept of cost function, gradient descent, polynomial regression
Logistic Regression	Slides	Binary classification; logistic regression; multi-class classification
Evaluation and Overfitting	Slides	Evaluation and error analysis Bias and Variance Overfitting, validation and regularization
Decision Trees	Slides	Classification using Decision Trees, Hunt's algorithm, Impurity measures (Gini index, entropy), overfitting and pruning a Decision Tree
Fairness in Machine Learning	Slides	Fairness and bias, and how to deal with them
Unsupervised Learning: Clustering	Slides	Prototype based clustering, hierarchical clustering, graph clustering, density-based clustering
Dimensionality Reduction	Slides	Supervised and unsupervised ways of dimensionality reduction, Principal Component Analysis
Naive Bayes classifier	Slides	Bayesian classifiers, Naive Bayes
Neural networks	Slides	Perceptrons, Multilayer Perceptrons, Neural networks, backpropagation algorithm, Stochastic Gradient Descent
Support vector machines	Slides	Margin, margin optimization, Kernel methods
Ensemble Learning	Slides	Bagging, Boosting
Introduction to Theory of Generalization	No slides	Bounding the testing error, Breakpoints, Vapnik-Chervonenkis inequality, VC Dimension