Scalable Data Mining (CS60021)
Instructors: Sourangshu Bhattacharya
Teaching Assistants: Suman Bera, Saptarshi Mondal, Vaishnovi Arun
Class Schedule: MON(11:00-11:55) , TUE(08:00-08:55) , TUE(09:00-09:55)
Classroom: NC - 121
Last year course website: https://cse.iitkgp.ac.in/~sourangshu/coursefiles/cs60021_2024a.html
Announcements:
- First class will be on Monday 28th July.
- First class test will be on 12th August in the class. Syllabus till the first two weeks.
- First Assignment has been uploaded on moodle. Deadline 17th August.
Course Schedule:
Syllabus:
Optimization and Machine learning algorithms:
- Optimization algorithms: Stochastic gradient descent, Variance reduction, Momentum algorithms, ADAM.
- Algorithms for distributed optimization: Distributed stochastic gradient descent and related methods. ADMM and decomposition methods.
- (New) Federated Learning.
Software paradigms:
- Big Data Processing: Motivation and Fundamentals, Map-reduce framework, Functional programming, and Scala Programming using map-reduce paradigm, Example programs.
- Deep Learning Frameworks (Pytorch): Motivation, Computation graphs, Tensors, Autograd, Modules, Example programs.
Algorithmic techniques:
- Finding similar items: Shingles, Minhashing, Locality Sensitive Hashing families, FAISS.
- Stream processing: Motivation, Sampling, Bloom filtering, Count-based sketches: FM sketch, AMS sketch, Hash-based sketches: count sketch.
- Subset Selection Methods: Submodular Optimization, Sparse Approximation, Convex Optimisation.
References:
- Mining of Massive Datasets. 2nd edition. - Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press. http://www.mmds.org/
- Tensorflow for Machine Intelligence: A hands-on introduction to learning algorithms. Sam Abrahams et al. Bleeding edge press.
- Hadoop: The Definitive Guide. Tom White. O'Reilly Press.
- Recent literature.