Scalable Data Mining (CS60021)
Instructor: Sourangshu Bhattacharya
Teaching Assistants: Soumi Das, Kiran Purohit
Class Schedule: Monday (8:00 - 9:55), Tuesday (12:00 - 12:55)
Classroom: CSE - 119
Last year course website: http://cse.iitkgp.ac.in/~sourangshu/coursefiles/cs60021-2021a.html
First Meeting: Tuesday, 2 August 2022, 12:00 pm
Content:
Course Schedule:
Week | Dates | Topic / Activity | Links / Material |
Week 1 |
2/8 |
Introduction |
Slides |
Week 2 |
8/8 |
Hadoop, Map-reduce, HDFS and Hadoop system. Ref: Hadoop: The Definitive Guide, Tom white Oreilly Publisher. |
Slides |
Syllabus:
In this course, we discuss algorithmic techniques
as well as software paradigms which allow one to
write scalable algorithms for the Machine Learning and Data
Mining tasks.
Software paradigms:
Big Data Processing: Motivation and Fundamentals.
Map-reduce framework. Functional programming and Scala.
Programming using map-reduce paradigm. Example programs.
Deep Learning Frameworks (Pytorch): Motivation,
Computation graphs, Tensors, Autograd, Modules, Example programs.
Optimization and Machine learning algorithms:
Optimization algorithms: Stochastic gradient descent,
Variance reduction, Momentum algorithms, ADAM.
Algorithms for distributed optimization: Distributed
stochastic gradient descent and related methods. ADMM and
decomposition methods.
Algorithmic techniques:
Dimensionality reduction: Random
projections, Johnson-Lindenstrauss lemma, JL transforms,
sparse JL-transform.
Finding similar items: Shingles, Minhashing, Locality
Sensitive Hashing families.
Stream processing: Motivation, Sampling, Bloom
filtering, Count based sketches: FM sketch, AMS
sketch. Hash based sketches: count sketch.
References:
- Mining of Massive Datasets. 2nd edition. - Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press. http://www.mmds.org/
- Tensorflow for Machine Intelligence: A hands on
introduction to learning algorithms. Sam Abrahams et al.
Bleeding edge press.
- Hadoop: The definitive Guide. Tom White. Oreilly Press.
- Recent literature.