... somewhere something incredible is waiting to be known.

Scalable Data Mining (CS60021)

Instructor: Sourangshu Bhattacharya

Teaching Assistants: Soumi Das and Bidisha Samanta

Class Schedule: Monday (8:00 - 9:55), Tuesday (12:00 – 12:55)

Classroom: CSE-107

Website: http://cse.iitkgp.ac.in/~sourangshu/coursefiles/cs60021-2019a.html

First Meeting: Tuesday, 16 July 2019, 12:00 pm



In this course, we discuss algorithmic techniques as well as software paradigms which allow one to write scalable algorithms for the common data mining tasks.

Software paradigms:
Big Data Processing
: Motivation and Fundamentals. Map-reduce framework. Functional programming and Scala. Programming using map-reduce paradigm. Case studies: Finding similar items, Page rank, Matrix factorization.
Tensorflow: Motivation, Computation graphs, Operations, Tensors, Example programs.

Algorithmic techniques:
Dimensionality reduction: Random projections, Johnson-Lindenstrauss lemma, JL transforms, sparse JL-transform.
Finding similar items
: Shingles, Minhashing, Locality Sensitive Hashing families.
Stream processing: Motivation, Sampling, Bloom filtering, Count based sketches: FM sketch,  AMS sketch. Hash based sketches: count sketch.

Optimization and Machine learning algorithms:
Optimization algorithms:
Stochastic gradient descent, Variance reduction, Momentum algorithms, ADAM.
Algorithms for distributed optimization: Distributed stochastic gradient descent and related methods. ADMM and decomposition methods.


  • Mining of Massive Datasets. 2nd edition. - Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press. http://www.mmds.org/
  • Tensorflow for Machine Intelligence: A hands on introduction to learning algorithms. Sam Abrahams et al. Bleeding edge press.
  • Hadoop: The definitive Guide. Tom White. Oreilly Press.
  • Recent literature.

Course Material:

  1. Hadoop Map reduce introduction and programming: slides
  2. Hadoop System, HDFS: slides slides
    1. Reference: Hadoop: The definitive Guide. Tom White. Oreilly Press. Chapters: 2, 3, 5, and 6
  3. Spark: Programming, Computation Graphs, System slides
    1. Reference: Spark RDD programming guide: http://spark.apache.org/docs/latest/rdd-programming-guide.html
    2. Practice Problems for Hadoop and Spark: here
  4. Tensorflow: Programming, Computation Graphs: slides
      Comparison with Pytorch, Static vs Dynamic Graphs: slides
  5. Streaming algorithms: Reservoir Sampling, Bloom filters: Slides
    1. Reference: MMDS Book: chapter 4
    2. Practice Problems: here
  6. Streaming algorithms: Count-distinct: Slides
  7. Streaming algorithms: Frequency Estimation: Slides
  8. Streaming algorithms: Count Sketch, Moment estimation, AMS sketch, Join size estimation, Range queries. Slides
    1. References: Some slides from "Data Stream Algorithms" by Andrew McGregor https://people.cs.umass.edu/~mcgregor/CS711S18/index.html
    2. Much of the material is discussed in "Sketch Techniques for Approximate Query Processing" by Graham Cormode: http://dimacs.rutgers.edu/~graham/pubs/papers/sk.pdf
    3. Practice Problems: here
  9. LSH: Shingles, Minhash, locality sensitive hashing. Slides
  10. LSH: Generalization to gap LSH, Simhash, hamming distance, etc. Slides
  11. LSH: Multi-probe LSH, Learning to hash Slides
  12. LSH Practice Problems: here. More practice questions on Shingles and Minhash can be found in the MMDS book.
  13. Large scale machine learning: Slides
    1. ML as stochastic optimization.
      1. Reference: MMDS book, chapter 12
    2. Convergence proof and Convergence rate for SGD
      1. Reference: Nesterov, Yu, and J-Ph Vial. "Confidence level solutions for stochastic programming." Automatica 44, no. 6 (2008): 1559-1568.
      2. Refernce: Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014. Chapter 14.
    3. Accelerated SGD vairants: Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv preprint arXiv:1609.04747 (2016).
    4. Finite sum SGD: SAGA, SVRG:
      1. Slides from: https://www.cs.ubc.ca/~schmidtm/SVAN16/ lecture 6.
      2. Johnson, Rie, and Tong Zhang. "Accelerating stochastic gradient descent using predictive variance reduction." In Advances in neural information processing systems, pp. 315-323. 2013.
      3. Schmidt, Mark, Nicolas Le Roux, and Francis Bach. "Minimizing finite sums with the stochastic average gradient." Mathematical Programming 162, no. 1-2 (2017): 83-112.
  14. Distributed optimization: Slides
    1. ADMM Reference: Boyd, Stephen, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. "Distributed optimization and statistical learning via the alternating direction method of multipliers." Foundations and Trends® in Machine learning 3, no. 1 (2011): 1-122.
    2. Distributed SVM: Forero, Pedro A., Alfonso Cano, and Georgios B. Giannakis. "Consensus-based distributed support vector machines." Journal of Machine Learning Research 11, no. May (2010): 1663-1707.
    3. Weighted Parameter Averaging: Das, Ayan, Raghuveer Chanda, Smriti Agrawal, and Sourangshu Bhattacharya. "Distributed Weighted Parameter Averaging for SVM Training on Big Data." In Workshops at the Thirty-First AAAI Conference on Artificial Intelligence. 2017.
  15. Optimization Practice Problems: here