In the past few years, OpenCL has emerged as a widely used parallel programming standard for high perfor-
mance computing on heterogeneous platforms comprising processors of widely varying architectures.
The programming standard offers program portability across an array of
different processor architectures e.g., general purpose (CPU), data parallel (GPU), task
parallel (CELL/B.E.) etc. The OpenCL API provides the programmer with a
vast array of options to write data-parallel programs efficiently for heterogeneous
architectures. However there exists a steep learning curve in learning the low level directives offered by the
OpenCL API completely. In addition to getting acquainted with the API, the programmer is burdened with the
task of ascertaining target device characteristics and writing efficient code for utilizing the varied
processing elements in the heterogeneous system to their complete potential. Our framework is based on
the PyOpenCL python package and provides user friendly abstractions over the underlying API for designing
efficient solutions much faster. Before we proceed to discuss the capabilities of the framework, we
present a brief overview of what application development using Opencl entails.
OpenCL applications are data parallel programs which process multidimensional
data. Every OpenCL application comprises two parts - a single threaded host program and a
data-parallel program referred to as kernel that describes the computation of a single work item.
A simple vector addition kernel is depicted below.
During program execution, a user specified number of work items (kernel invocations) is launched
by the host program to execute in parallel. These work items are organized in
a multi dimensional grid and subsets of work items are grouped together to form
work groups, which are mapped to an OpenCL device. Each work-item can query its
position in the grid by calling certain built-in OpenCL functions from within the kernel code. The
job of the host side program is to orchestrate the execution of the OpenCL kernel across different OpenCL
compliant devices in the heterogeneous platform. This includes proper buffer management associated with the kernel
computation and synchronization mechanisms for ensuring correctness of the computation. Our framework aims
to relieve the programmer from the intricacies of the host program so that they may focus
on the underlying high performance algorithm more. The capabilities of the framework are higlighted below.
The primary goal of PySchedCL is to provide a platform for rapid prototyping of
high performance applications as well as a research tool for experimenting various scheduling and mapping policies
of multiple OpenCL applications for different heterogeneous platforms. The framework leverages a kernel specification scheme that specifies
information regarding the host side management for the execution of an OpenCL kernel. Given the corresponding specification
file for a kernel, the framework is capable of distributing the computation across the devices
of the target heterogeneous architecture. The salient features of the framework is represented in the following figure and discussed below.
OpenCL Overview
PySchedCL Goals
PySchedCL Version 1.0
The current version of PySchedCL provides a scheduling engine that may be leveraged for the following tasks.
In addition to the above tasks, the tool provides a rich API and an extensive documentation for
extending the current capabilities of the framework with ease.