HOME
PEOPLE
RESEARCH
High Performance Run-time Systems
Cyber-Physical Systems
Smart Grid
Automotive
PUBLICATIONS
FUNDING
COURSES
CONTACT US
Previous
Next
GPGPU and Real-time Resources
Updates
Upcoming NPTEL Course on GPU Architectures and Programming
Upcoming real-time and embedded conferences
Learning - Books and Online Articles
Course on GPGPU
GPU Architectures and Programming NPTEL Course
CUDA resources
CUDA Introduction and Future
Getting Started with CUDA
CUDA RefresherArticles
GPU Gems
Programming Massively Parallel Processors - David Kirk and Wen-mei Hwu
Professional CUDA C Programming - John Cheng, Max Grossman, and Ty McKercher
OpenCL resources
Khronos OpenCL Overview
Khronos SYCL Overview
Hands On OpenCL
Guide on SYCL
SGEMM Tutorial and Optimizations
Heterogeneous Computing with OpenCL 2.0 - Dana Schaa, Dong Ping Zhang, Perhaad Mistry, David R. Kaeli
News and Trivia
Khronos releases SYCL 2020 Provisional Specification
OneAPI Industry Initiative
GPU Technology Conference Digital Articles
Hub for GPGPU Research
How to build a GPU Accelerated Research Cluster
Nvidia HPC Articles for CUDA
Khronos Articles for OpenCL
APIs and Libraries
CUDA APIs
CUDA Toolkit Documentation
CUDA Runtime API
CUDA PTX ISA
cuBLAS: Linear Algebra library for BLAS functions implemented using CUDA
cuSparse Sparse Linear Algebra routines implemented using CUDA
cuDNN - GPU Accelerated library for Deep Neural Networks
cuPy- NumPy like library accelerated using CUDA
CUTLASS- A collection of CUDA C++ template abstractions for GEMM operations
Gunrock - CUDA Library for Graph Processing
nvGRAPH - GPU Accelerated library for Graph Analytics
TensorRT- Optimized Deep Learning Inference
cuFFT - GPU Accelerated library for Fast Fourier Transforms
OpenCL APIs
OpenCL Installation on Linux
OpenCL Specification
PyOpenCL Documentation
CLBlast: Automatically tuning OpenCL Linear Algebra BLAS kernels
Portable Computing Language
A software library for BLAS functions implemented using OpenCL
A software library for FFT functions implemented using OpenCL
A software library for sparse tensor operations implemented using OpenCL
Intercept layer for analyzing Intel OpenCL Applications
ARM Compute Library
DPC++ - Intel's implementation of SYCL
ComputeCPP - Codeplay's implementation of SYCL
triSYCL - Opensource SYCL implementation targeted towards Xilinx FPGAs
Back