CS60026: PARALLEL AND DISTRIBUTED ALGORITHMS (LTP: 3 0 0, Credits 3)

Co-taught with Dr Abhishek Somani, Principal Engineer Mentor Graphics India Private Limited

From mobile phones and laptop computers to high-end server farms, parallel computers are so ubiquitous that it has become difficult to find computers with single processor core these days. The advent of Nvidia GPGPU earlier and Intel Xeon Phi co-processor recently has brought Cray-like vectorized supercomputing resources to the server and desktop level computers. In short, the parallel computing age is here and now, and we should be well equipped to face it as engineers and scientists.
In this course, we attempt to provide an indepth discourse on how to think about algorithms in a parallelized manner. Essentially, we provide a detailed study
of how to design, analyze, and implement parallel algorithms for computers that have multiple processors. These steps are not always easy. Some
algorithms suitable for conventional, single-processor computers are not appropriate for parallel architectures. Many
algorithms with inherent parallelism have a higher computational complexity than the best sequential counterpart. In either
case, implementing an inappropriate algorithm wastes a parallel computer's resources. In this course, we attempt to delve into these
matters, essentially to comprehend how to choose a parallel algorithm that makes good use of the target architecture.

The Idea of Parallelism: A Parallelised version of the Sieve of Eratosthenes

PRAM Model of Parallel Computation

Pointer Jumping and Divide & Conquer: Useful Techniques for Parallelization

PRAM Algorithms: Parallel Reduction, Prefix Sums, List Ranking, Preorder Tree Traversal, Merging Two Sorted Lists, Graph Coloring

Reducing the Number of Processors and Brent's Theorem

Dichotomoy of Parallel Computing Platforms

Cost of Communication

Programmer's view of modern multi-core processors

The role of compilers and writing efficient serial programs

Parallel Complexity: The P-Complete Class

Mapping and Scheduling

Elementary Parallel Algorithms

Sorting

Parallel Programming Languages: Shared Memory Parallel Programming using OpenMP

Writing efficient openMP programs

Dictionary Operations: Parallel Search

Graph Algorithms

Matrix Multiplication

Industrial Strength programming 1:

- Programming for performance; Dense Matrix-matrix multiplication through various stages: data access optimization, loop interchange, blocking and tiling
- Analyze BLAS (Basic Linear Algebra System) and ATLAS (Automatically Tuned Linear Algebra System) code

Distributed Algorithms: models and complexity measures.

Safety, liveness, termination, logical time and event ordering

Global state and snapshot algorithms

Mutual exclusion and Clock Synchronization

Distributed Graph algorithms

Distributed Memory Parallel Programming: Cover MPI programming basics with simple programs and most useful directives; Demonstrate Parallel Monte Carlo

Industrial strength programming 2:

- Scalable programming for capacity
- Distributed sorting of massive arrays
- Distributed Breadth-First Search of huge graphs and finding Connected Components

Michael J Quinn, Parallel Computing, TMH

Joseph Jaja, An Introduction to Parallel Algorithms, Addison Wesley

Mukesh Singhal and Niranjan G. Shivaratri, Advanced Concepts in Operating Systems, TMH

Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar, Introduction to Parallel Computing, Pearson

To be annouced

Overview: The Idea of Parallelism

Parallel Thinking: Sieve of Eratosthenes

Shared Memory Parallel Programming

Pointer Jumping and Preorder Tree Traversal

Merging, Brent's Theorem, Few Applications

Independent Sets for List Ranking and Euler Tour Technique (Introduction)

Evaluating Tree Expressions: Tree Contraction and Raking

Evaluating Tree Expressions (contd.)

Programmer's View of Multi-core Processors and Compiler's role in writing efficient serial programs

Some Example Applications of Prefix-Sums to Solve Recurrences in Parallel

Pipelined Merge-Sort Algorithm

Distributed Programming with MPI

Parallel Processor Organization

Mapping Parallel Algorithms to Parallel Platforms

Communications on Hypercube Platforms

Tutorial-1: Questions

Tutorial-2: Questions

Assignment-1: Parallelizing Sparse Matrix-Vector Multiplication (Assignment Statement)(Assignment Code Tar Ball)

Assignment-2: Parallel Merge Sort (Assignment Statement)(Assignment Code Tar Ball)

Connecting to Computing Platform