GPGPU and Real-time Resources



Updates

Upcoming NPTEL Course on GPU Architectures and Programming

Upcoming real-time and embedded conferences

Learning - Books and Online Articles

Course on GPGPU

  1. GPU Architectures and Programming NPTEL Course

CUDA resources

  1. CUDA Introduction and Future
  2. Getting Started with CUDA
  3. CUDA RefresherArticles
  4. GPU Gems
  5. Programming Massively Parallel Processors - David Kirk and Wen-mei Hwu
  6. Professional CUDA C Programming - John Cheng, Max Grossman, and Ty McKercher

OpenCL resources

  1. Khronos OpenCL Overview
  2. Khronos SYCL Overview
  3. Hands On OpenCL
  4. Guide on SYCL
  5. SGEMM Tutorial and Optimizations
  6. Heterogeneous Computing with OpenCL 2.0 - Dana Schaa, Dong Ping Zhang, Perhaad Mistry, David R. Kaeli

News and Trivia

  1. Khronos releases SYCL 2020 Provisional Specification
  2. OneAPI Industry Initiative
  3. GPU Technology Conference Digital Articles
  4. Hub for GPGPU Research
  5. How to build a GPU Accelerated Research Cluster
  6. Nvidia HPC Articles for CUDA
  7. Khronos Articles for OpenCL

APIs and Libraries

CUDA APIs

  1. CUDA Toolkit Documentation
  2. CUDA Runtime API
  3. CUDA PTX ISA
  4. cuBLAS: Linear Algebra library for BLAS functions implemented using CUDA
  5. cuSparse Sparse Linear Algebra routines implemented using CUDA
  6. cuDNN - GPU Accelerated library for Deep Neural Networks
  7. cuPy- NumPy like library accelerated using CUDA
  8. CUTLASS- A collection of CUDA C++ template abstractions for GEMM operations
  9. Gunrock - CUDA Library for Graph Processing
  10. nvGRAPH - GPU Accelerated library for Graph Analytics
  11. TensorRT- Optimized Deep Learning Inference
  12. cuFFT - GPU Accelerated library for Fast Fourier Transforms

OpenCL APIs

  1. OpenCL Installation on Linux
  2. OpenCL Specification
  3. PyOpenCL Documentation
  4. CLBlast: Automatically tuning OpenCL Linear Algebra BLAS kernels
  5. Portable Computing Language
  6. A software library for BLAS functions implemented using OpenCL
  7. A software library for FFT functions implemented using OpenCL
  8. A software library for sparse tensor operations implemented using OpenCL
  9. Intercept layer for analyzing Intel OpenCL Applications
  10. ARM Compute Library
  11. DPC++ - Intel's implementation of SYCL
  12. ComputeCPP - Codeplay's implementation of SYCL
  13. triSYCL - Opensource SYCL implementation targeted towards Xilinx FPGAs

Back