Co-taught with Dr Abhishek Somani, Principal Engineer Mentor Graphics India Private Limited

Course Outline
From mobile phones and laptop computers to high-end server farms, parallel computers are so ubiquitous that it has become difficult to find computers with single processor core these days. The advent of Nvidia GPGPU earlier and Intel Xeon Phi co-processor recently has brought Cray-like vectorized supercomputing resources to the server and desktop level computers. In short, the parallel computing age is here and now, and we should be well equipped to face it as engineers and scientists. In this course, we attempt to provide an indepth discourse on how to think about algorithms in a parallelized manner. Essentially, we provide a detailed study of how to design, analyze, and implement parallel algorithms for computers that have multiple processors. These steps are not always easy. Some algorithms suitable for conventional, single-processor computers are not appropriate for parallel architectures. Many algorithms with inherent parallelism have a higher computational complexity than the best sequential counterpart. In either case, implementing an inappropriate algorithm wastes a parallel computer's resources. In this course, we attempt to delve into these matters, essentially to comprehend how to choose a parallel algorithm that makes good use of the target architecture.

Detailed Syllabus:

The Idea of Parallelism: A Parallelised version of the Sieve of Eratosthenes

PRAM Model of Parallel Computation

Pointer Jumping and Divide & Conquer: Useful Techniques for Parallelization

PRAM Algorithms: Parallel Reduction, Prefix Sums, List Ranking, Preorder Tree Traversal, Merging Two Sorted Lists, Graph Coloring

Reducing the Number of Processors and Brent's Theorem

Dichotomoy of Parallel Computing Platforms

Cost of Communication

Programmer's view of modern multi-core processors

The role of compilers and writing efficient serial programs

Parallel Complexity: The P-Complete Class

Mapping and Scheduling

Elementary Parallel Algorithms


Parallel Programming Languages: Shared Memory Parallel Programming using OpenMP

Writing efficient openMP programs

Dictionary Operations: Parallel Search

Graph Algorithms

Matrix Multiplication

Industrial Strength programming 1:

  1. Programming for performance; Dense Matrix-matrix multiplication through various stages: data access optimization, loop interchange, blocking and tiling
  2. Analyze BLAS (Basic Linear Algebra System) and ATLAS (Automatically Tuned Linear Algebra System) code

Distributed Algorithms: models and complexity measures.

Safety, liveness, termination, logical time and event ordering

Global state and snapshot algorithms

Mutual exclusion and Clock Synchronization

Distributed Graph algorithms

Distributed Memory Parallel Programming: Cover MPI programming basics with simple programs and most useful directives; Demonstrate Parallel Monte Carlo

Industrial strength programming 2:

  1. Scalable programming for capacity
  2. Distributed sorting of massive arrays
  3. Distributed Breadth-First Search of huge graphs and finding Connected Components

Text Books:

Michael J Quinn, Parallel Computing, TMH

Joseph Jaja, An Introduction to Parallel Algorithms, Addison Wesley

Mukesh Singhal and Niranjan G. Shivaratri, Advanced Concepts in Operating Systems, TMH

Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar, Introduction to Parallel Computing, Pearson

Reference Books/Notes:

To be annouced

Presentations for the class:

Overview: The Idea of Parallelism

Parallel Thinking: Sieve of Eratosthenes

PRAM Algorithms

Shared Memory Parallel Programming

Pointer Jumping and Preorder Tree Traversal

Merging, Brent's Theorem, Few Applications

Optimal Merging

List Ranking and Coloring

Independent Sets for List Ranking and Euler Tour Technique (Introduction)

Evaluating Tree Expressions: Tree Contraction and Raking

Evaluating Tree Expressions (contd.)

Programmer's View of Multi-core Processors and Compiler's role in writing efficient serial programs

Some Example Applications of Prefix-Sums to Solve Recurrences in Parallel

Parallel Search, Faster Merge

Pipelined Merge-Sort Algorithm

Distributed Programming with MPI

Bitonic Sort

Parallel Processor Organization

Mapping Parallel Algorithms to Parallel Platforms

Communications on Hypercube Platforms

Parallel Recursive Programs

Tutorial Questions With Solutions:

Tutorial-1: Questions

Tutorial-2: Questions

Programming Assignments:

Assignment-1: Parallelizing Sparse Matrix-Vector Multiplication (Assignment Statement)(Assignment Code Tar Ball)

Assignment-2: Parallel Merge Sort (Assignment Statement)(Assignment Code Tar Ball)

Access to the Computing Resources:

Connecting to Computing Platform

Question Papers: