Scalable Data Mining (CS60021)
Instructor: Sourangshu Bhattacharya
Teaching Assistants: Soumi Das, Kiran Purohit
Class Schedule: Monday (8:00 - 9:55), Tuesday (12:00 - 12:55)
 Classroom: CSE - 119
          
Last year course website: http://cse.iitkgp.ac.in/~sourangshu/coursefiles/cs60021-2021a.html
First Meeting: Tuesday, 2 August 2022, 12:00 pm
Content:
Course Schedule:
| Week | Dates | Topic / Activity | Links / Material | 
| Week 1 | 2/8 | Introduction | Slides | 
| Week 2 | 8/8 | Hadoop, Map-reduce, HDFS and Hadoop system. Ref: Hadoop: The Definitive Guide, Tom white Oreilly Publisher. | Slides | 
Syllabus:
 In this course, we discuss algorithmic techniques
            as well as software paradigms which allow one to
            write scalable algorithms for the Machine Learning and Data
            Mining tasks.
            
Software paradigms:
              Big Data Processing: Motivation and Fundamentals.
            Map-reduce framework. Functional programming and Scala.
            Programming using map-reduce paradigm. Example programs.
            Deep Learning Frameworks (Pytorch): Motivation,
            Computation graphs, Tensors, Autograd, Modules, Example programs.
          
Optimization and Machine learning algorithms:
              Optimization algorithms: Stochastic gradient descent,
            Variance reduction, Momentum algorithms, ADAM. 
            Algorithms for distributed optimization: Distributed
            stochastic gradient descent and related methods. ADMM and
            decomposition methods.
Algorithmic techniques:
            Dimensionality reduction: Random
            projections, Johnson-Lindenstrauss lemma, JL transforms,
            sparse JL-transform.
              Finding similar items: Shingles, Minhashing, Locality
            Sensitive Hashing families.
            Stream processing: Motivation, Sampling, Bloom
            filtering, Count based sketches: FM sketch,  AMS
            sketch. Hash based sketches: count sketch.
References:
- Mining of Massive Datasets. 2nd edition. - Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press. http://www.mmds.org/
- Tensorflow for Machine Intelligence: A hands on
              introduction to learning algorithms. Sam Abrahams et al.
              Bleeding edge press.
 
- Hadoop: The definitive Guide. Tom White. Oreilly Press.
- Recent literature.