Scalable Data Mining (CS60021)

Instructors: Sourangshu Bhattacharya, Pabitra Mitra

Teaching Assistants: Kiran Purohit, Shubhadip Nag

Class Schedule: Monday (10:00 - 10:55), Wednesday (8:00 - 9:55)

Classroom: CSE - 119

Last year course website: https://panuragreddy.github.io/SDM_2023/

Announcements:

Course Schedule:

Week Dates Topic / Activity Links / Material
Week 1 22/7, 24/7 Introduction to ML DL, Stochastic Gradient Descent Slides
Week 2 29/7, 31/7 SGD convergence, Accelarated SGD Slides - SGD Convergence, Accelerated SGD
Week 3 5/8, 7/8 Convergence rate SGD, Linear-rate SGD methods, Batch-normalization Slides - SGD Convergence rate, Slides - Batch-normalization
Week 4 12/8, 14/8 ADMM for distributed loss minimization Slides - ADMM
Week 5+6 19/8, 21/8, 26/8, 29/8 Hadoop + Spark Slides - Hadoop, Spark
Week 7 2/9, 4/9 DL frameworks Slides - Pytorch
Week 8 9/9/11/9 Subset Selection Slides - Submodular Functions, Sparse Approximation, Convex Online

Syllabus:

Optimization and Machine learning algorithms:

Software paradigms:

Algorithmic techniques:

References:

  1. Mining of Massive Datasets. 2nd edition. - Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press. http://www.mmds.org/
  2. Tensorflow for Machine Intelligence: A hands-on introduction to learning algorithms. Sam Abrahams et al. Bleeding edge press.
  3. Hadoop: The Definitive Guide. Tom White. O'Reilly Press.
  4. Recent literature.