Course Outline
From mobile phones and laptop computers to high-end server farms, parallel computers are so ubiquitous that it has become difficult to find computers with single processor core these days. The advent of Nvidia GPGPU earlier and Intel Xeon Phi co-processor recently has brought Cray-like vectorized supercomputing resources to the server and desktop level computers. In short, the parallel computing age is here and now, and we should be well equipped to face it as engineers and scientists.
In this course, we attempt to provide an indepth discourse on how to think about algorithms in a parallelized manner. Essentially, we provide a detailed study
of how to design, analyze, and implement parallel algorithms for computers that have multiple processors. These steps are not always easy. Some
algorithms suitable for conventional, single-processor computers are not appropriate for parallel architectures. Many
algorithms with inherent parallelism have a higher computational complexity than the best sequential counterpart. In either
case, implementing an inappropriate algorithm wastes a parallel computer's resources. In this course, we attempt to delve into these
matters, essentially to comprehend how to choose a parallel algorithm that makes good use of the target architecture.
Detailed Syllabus:
The Idea of Parallelism: A Parallelised version of the Sieve of Eratosthenes
PRAM Model of Parallel Computation
Pointer Jumping and Divide & Conquer: Useful Techniques for Parallelization
PRAM Algorithms: Parallel Reduction, Prefix Sums, List Ranking, Preorder Tree Traversal, Merging Two Sorted Lists, Graph Coloring
Reducing the Number of Processors and Brent's Theorem
Dichotomoy of Parallel Computing Platforms
Cost of Communication
Programmer's view of modern multi-core processors
The role of compilers and writing efficient serial programs
Parallel Complexity: The P-Complete Class
Mapping and Scheduling
Elementary Parallel Algorithms
Sorting
Parallel Programming Languages: Shared Memory Parallel Programming using OpenMP
Writing efficient openMP programs
Dictionary Operations: Parallel Search
Graph Algorithms
Matrix Multiplication
Industrial Strength programming 1:
- Programming for performance; Dense Matrix-matrix multiplication through various stages: data access optimization, loop interchange, blocking and tiling
- Analyze BLAS (Basic Linear Algebra System) and ATLAS (Automatically Tuned Linear Algebra System) code
Distributed Algorithms: models and complexity measures.
Safety, liveness, termination, logical time and event ordering
Global state and snapshot algorithms
Mutual exclusion and Clock Synchronization
Distributed Graph algorithms
Distributed Memory Parallel Programming: Cover MPI programming basics with simple programs and most useful directives; Demonstrate Parallel Monte Carlo
Industrial strength programming 2:
- Scalable programming for capacity
- Distributed sorting of massive arrays
- Distributed Breadth-First Search of huge graphs and finding Connected Components
Text Books:
Michael J Quinn, Parallel Computing, TMH
Joseph Jaja, An Introduction to Parallel Algorithms, Addison Wesley
Mukesh Singhal and Niranjan G. Shivaratri, Advanced Concepts in Operating Systems, TMH
Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar, Introduction to Parallel Computing, Pearson
Reference Books/Notes:
To be annouced
Presentations for the class:
Overview: The Idea of Parallelism
Parallel Thinking: Sieve of Eratosthenes
PRAM Algorithms
Programming Assignments:
Access to the Computing Resources:
Connecting to Computing Platform
Question Papers: