Scalable Data Mining (CS60021)
Instructors: Sourangshu Bhattacharya, Pabitra Mitra
Teaching Assistants: Kiran Purohit, Shubhadip Nag
Class Schedule: Monday (10:00 - 10:55), Wednesday (8:00 - 9:55)
Classroom: CSE - 119
Last year course website: https://panuragreddy.github.io/SDM_2023/
Announcements:
Course Schedule:
Week |
Dates |
Topic / Activity |
Links / Material |
Week 1 |
22/7, 24/7 |
Introduction to ML DL, Stochastic Gradient Descent |
Slides |
Week 2 |
29/7, 31/7 |
SGD convergence, Accelarated SGD |
Slides - SGD Convergence, Accelerated SGD
|
Week 3 |
5/8, 7/8 |
Convergence rate SGD, Linear-rate SGD methods, Batch-normalization |
Slides - SGD Convergence rate,
Slides - Batch-normalization
|
Week 4 |
12/8, 14/8 |
ADMM for distributed loss minimization |
Slides - ADMM
|
Week 5+6 |
19/8, 21/8, 26/8, 29/8 |
Hadoop + Spark |
Slides - Hadoop, Spark
|
Week 7 |
2/9, 4/9 |
DL frameworks |
Slides - Pytorch
|
Week 8 |
9/9/11/9 |
Subset Selection |
Slides - Submodular Functions, Sparse Approximation, Convex Online
|
Week 9 |
14/10, 16/11 |
Nearest Neighbor Search |
Slides - LSH, HNSW
|
Syllabus:
Optimization and Machine learning algorithms:
- Optimization algorithms: Stochastic gradient descent, Variance reduction, Momentum algorithms, ADAM.
- Algorithms for distributed optimization: Distributed stochastic gradient descent and related methods. ADMM and decomposition methods.
- (New) Federated Learning.
Software paradigms:
- Big Data Processing: Motivation and Fundamentals, Map-reduce framework, Functional programming, and Scala Programming using map-reduce paradigm, Example programs.
- Deep Learning Frameworks (Pytorch): Motivation, Computation graphs, Tensors, Autograd, Modules, Example programs.
Algorithmic techniques:
- Finding similar items: Shingles, Minhashing, Locality Sensitive Hashing families, FAISS.
- Stream processing: Motivation, Sampling, Bloom filtering, Count-based sketches: FM sketch, AMS sketch, Hash-based sketches: count sketch.
- Subset Selection Methods: Submodular Optimization, Sparse Approximation, Convex Optimisation.
References:
- Mining of Massive Datasets. 2nd edition. - Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press. http://www.mmds.org/
- Tensorflow for Machine Intelligence: A hands-on introduction to learning algorithms. Sam Abrahams et al. Bleeding edge press.
- Hadoop: The Definitive Guide. Tom White. O'Reilly Press.
- Recent literature.