Parallel Programming (3:1)
The objective of this course is to give you some level of
confidence in parallel programming techniques, algorithms and tools. At the end of the course, you would (we
hope) be in a position to deal with reallife parallel problems and to
explore new avenues of research in the area of parallel programming.
Class
 Santanu T
 Anwit Roy
 Abhishek Kolipaka
 Sharath Chandra
 Gowthami Gottipatti
 Rooparam Chaudhury
 M Aravind Krishnan
 Sreekanth Raja
 Priti Parate
 Rajendra Kumar
 Vaibhav Goel
 P Vinay Kumar
 Chandrakant Raj
Instructor
Sathish Vadhiyar
Reading Materials
 Introduction
to Parallel Computing
by
Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar Publisher:
Addison Wesley; (2003) ISBN: 0201648652
 Various publication materials, lecture slides
Additional Reading
 Petascale
Computing: Algorithms and Applications
David A. Bader (Ed.), Chapman & Hall/CRC Computational Science Series. 2007.
 The
Sourcebook of Parallel Computing
by Jack Dongarra (Editor), Ian
Foster, Geoffrey Fox (Editor), Ken Kennedy, Andy White (Editor), Linda
Torczon, Wiliiam Gropp
Publisher: Morgan Kaufmann; (November 2002) ISBN:
1558608710

Numerical
Linear Algebra for High Performance Computers
by
Jack Dongarra, Iain Duff, Danny Sorensen, Henk van der Vorst Publisher:
SIAM; (1998) ISBN: 0898714281
Meeting Hours
11:301:00; Tuesday, Thursday; Room 202, SERC
Grading System
Syllabus
 Introduction
ppt
 Parallel Programming Tools
 MPI  1
ppt
 Collective
communication implementations
ppt
 MPI process groups and
topologies
 MPI2
 Overview
ppt
 I/O Optimizations
ppt
 GPU Programming
 Quick review
ppt
 Obtaining High Performance on GPUs
ppt
 Parallel Algorithms and
Applications
 Parallel Linear algebra

Dense LA  Gaussian
elimination (different distributions), tridiagonal systems (divide
and conquer algorithm)

Sparse LA and relate
graph algorithms  basics and sequential solutions; parallel steps
for ordering, symbolic and numerical factorization, triangular
solve; parallel graph partitioning; iterative methods; parallel
graph coloring
 FFT
ppt
 Molecular Dynamics
ppt
 Population evolution or
Game of Life ppt
 Cosmology
ppt
 Mesh applications (e.g.
finite element computations) ppt
 Parallel Bioinformatics
ppt
 Parallel System
Software
 Scheduling in Parallel
Systems ppt
 Checkpointing and Fault
tolerance for large systems ppt
Reading Material

Introduction: Look at the
concurrency and
parallelization slides of
HPC 2010 course for details on
programming models, metrics, architecture classification, cache coherence,
interconnection networks, synchronization, steps in parallelization
(mapping, orchestration etc.) and the corresponding reading material
suggested in the HPC 2010 web site.

MPI1: Online tutorial "MPI Complete Reference". Google
for it.

Collective Communications: Lecture slides, and Google for
paper "Optimization of Collective Communication Operations in MPICH" by Thakur, Rabenseifner and Gropp,
IJHPCA 2005

MPI2: Online tutorial:
http://wwwunix.mcs.anl.gov/mpi/mpistandard/mpireport2.0/mpi2report.htm

I/O Optimizations: Google for paper Rajeev Thakur, William Gropp, and Ewing
Lusk, “A Case for Using MPI's Derived Datatypes to. Improve I/O
Performance,” in Proceedings of SC98

GPU and CUDA:
 Dense Linear Algebra
 Sparse Linear Algebra
 Sources [You can get these papers from me and take photocopies]
 Parallel Algorithms for sparse linear systems  Heath, Ng and
Peyton
 Reordering sparse matrices for parallel elimination  Liu
 A parallel graph partitioning algorithm for a messagepassing
multiprocessor  Gilbert and Zmijewski
 Task scheduling for parallel sparse Cholesky factorization 
Geist and Ng
 Parallel algorithms for forward and back substitution in direct
solution of sparse linear systems  Anshul Gupta and Vipin Kumar
 Matrix computations  Golub and van Loan
 Lecture slides
 General steps of sparse matrix factorization: Heath, Ng and Peyton 
pages 420429
 Parallel ordering
 Heath, Ng and Peyton  pages 429435 till Kernighan Lin
 Liu  pages 75 and 89 (you can read other pages on reduction of
elimination tree heights if interested),
 for Kernighan Lin ordering  Gilbert and Zmijewski  pages
427433, 437440, your lecture slides
 for mapping: Heath, Ng and Peyton
 437439, figures 9 and 10; Geist and Ng  sections 3 and 4
 for numerical factorization: Heath, Ng and Peyton  442450
 FFT  FFT Chapter in "Introduction to Parallel Computing" book
 Molecular Dynamics  lecture slides, the paper ``A New Parallel Method
for Molecular Dynamics Simulation of Macromolecular Systems'' by Plimpton
and Hendrickson
 Game of Life  Lecture slides
 Cosmology
 Lecture slides
 The paper "Scalable parallel formulations of the BarnesHut method
for nbody simulations" by Grama, Kumar and Sameh. In Parallel Computing
24(1998) 797822.
 The paper "Load balancing and data locality in adaptive hierarchical
Nbody methods: BarnesHut, Fast Multipole, and Dardiosity" by Singh,
Holt, Totsuka, Gupta and Hennessey. In Journal of Parallel and
Distributed Computing, 1994
 Mesh applications
 Lecture slides
 Paper: "Multilevel diffusion schemes for repartitioning of adaptive
meshes" by Schloegel et al.
 Paper: "A unified algorithm for loadbalancing adaptive scientific
simulations" by Schloegel et al.
 Paper: "Dynamic repartitioning of adaptively refined meshes" by
Schloegel et al.
 Bioinformatics
 Paper: "Parallel biological sequence comparison using prefix
computations" by Aluru et al.
 Scheduling
 Paper: "Backfilling with lookahead to optimize the packing of
parallel jobs" by Shmueli and Feitelson. JPDC 2005.
 Paper: "A comparison study of eleven static mapping heuristics for a
class of metatasks on heterogeneous computing systems" by Tracy Braun
et al., HCW 1999.
 Checkpointing:
 Paper: "An overview of checkpointing in uniprocessor and distributed
systems, focusing on implementation and performance" by James Plank
MidTerm Exams
Assignments

Platform for Assignments

Important
Notes

Collective communications and 2D Jacobi

CUDA programming and optimizations

Parallel Subcolumn Cholesky

Load Balancing Nbody Simulations

Parallel Job Scheduling
Links