Scheduling

PySchedCL offers a scheduling engine (scheduler.py) capable of dispatching a set of independent OpenCL kernels to available CPU and GPU devices on a heterogeneous platform. The engine takes care of the OpenCL runtime context, command queue and device management. It also takes care of managing resources associated with each OpenCL kernel. Given a set of kernel specification files and a scheduling policy, the engine takes care of managing each read, write and computation operation associated with all kernels through correct synchronization primitives. This is depicted in the following figure.

Depending on the total number of CPU and GPU devices present in the target platform, corresponding contexts, command queues and devices are initially setup. Buffer and argument information for each OpenCL kernel is maintained by the framework. The dispatch() function dictates to which set of devices a kernel should be mapped at runtime. The selection criteria for choosing a kernel from a set of kernels competing for resources constitutes the scheduling heuristic. The scheduling engine allows the user to design and incorporate their own scheduling heuristics by defining their own version of the select() function. The input to this function would be a set of user defined task queue(s) where each queue may contain OpenCL kernels that share the same characteristics. Depending on the device availability and the selection strategy, the relevant set of arguments is provided to the dispatch() function which carries out the final runtime mapping. The scheduling engine internally maintains ready queues and device counters that depict runtime device availability. Once a kernel has finished execution on a particular device, it notifies the engine with the help of notify_callback() function. The corresponding ready queues and device counters are updated accordingly.

We provide a list of guidelines that may be used by end users in devising and experimenting custom scheduling heuristics using the scheduling engine. We proceed by describing the implementation of the baseline heuristic in detail.

Queue Specification

The user initially needs to provide a task queue specification by setting the number of task queues and specifying characteristic properties of each task queue. One example would be specifying the partition boundaries of each queue. For the baseline heuristic, the task queue specification maintains three distinct priority queues cpu_q, gpu_q and mixed_q containing tasks that should execute only on the CPU, only on the GPU, and in a partitioned fashion between a CPU and GPU respectively. The ordering of tasks inside each queue is done on the basis of total number of global work items. The corresponding code for task queue specification must be written inside select_main function of the scheduler.py script. This is depicted below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    for i in range(len(kernels)):
        kernels[i].build_kernel(gpus, cpus, ctxs)
        kernels[i].random_data()
        p = kernels[i].partition
        num_global_work_items = kernels[i].get_num_global_work_items()

        if p == 0:
            heapq.heappush(cpu_q, (p, -num_global_work_items, i))
            rCPU += 1
        elif p == 10:
            heapq.heappush(gpu_q, (-p, -num_global_work_items, i))
            rGPU += 1
        else:

            heapq.heappush(mixed_q, (abs(p-5), -num_global_work_items, i))
            rCPU +=1
            rGPU +=1

Custom Select Routine

After setting up the task queue specification, the user must specify a scheduling heuristic by designing a select() function in scheduler.py. The arguments of select() comprise the queues in the task queue specification. The selection strategy may be designed by a set of rules that uses this task queue specification. The set of rules for the baseline strategy are as follows.

  1. If both types of devices are available, a task from mixed_q is dequeued and selected.
  2. If only one type of device is available, tasks from queues cpu_q or gpu_q are selected and returned.
The primary objective of the baseline strategy is to dispatch each and every OpenCL kernel to the target heterogeneous platform while respecting their individual partition classes. The corresponding code for the selection strategy is depicted below.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
def baseline_select(kernels, **kwargs):
    mixed_q = kwargs['M']
    gpu_q = kwargs['G']
    cpu_q = kwargs['C']

    now_kernel = None
    if fw.nCPU > 0 and fw.nGPU > 0 and mixed_q:
        i_p, i_num_global_work_items, i = heapq.heappop(mixed_q)
        i_num_global_work_items, i_p = kernels[i].get_num_global_work_items(), kernels[i].partition

        now_kernel = (i, i_p)
    elif fw.nCPU > 0 and cpu_q:
        c_p, c_num_global_work_items, c_i = heapq.heappop(cpu_q)
        now_kernel = (c_i, 0)
    elif fw.nGPU > 0 and gpu_q:
        g_p, g_num_global_work_items, g_i = heapq.heappop(gpu_q)
        now_kernel = (g_i, 10)
    if now_kernel is not None:
        i, p =now_kernel
        logging.debug( "Selecting kernel " + kernels[i].name + " with partition class value " + str(p))
    return now_kernel, None

Register Select Routine in Engine

Once, the task queue specification has been designed and the select function has been implemented, the method should be registered into the scheduling engine so that it may be called using the command line utility. This may be done by first including the name of the selection strategy in the parse_arg() routine so that the command line utility may display the available scheduling heuristic on the terminal. This should be followed by including a conditional statement inside the main function of scheduler.py. Both of these changes are depicted below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def parse_arg(args=None):

    parser.add_argument('-s', '--select',
                        help='Scheduling heuristic (baseline, lookahead, adbias) ',
                        default='baseline')

if __name__ == '__main__':

     if args.select == "baseline":
        dump_dev = select_main(kernels, select=baseline_select)

TaskSet File Specification

The scheduler.py script processes a file with .task extension which contains a list of kernel specification files along with partition class values and dataset values. A sample taskset specification file is shown below.

 
mean.json 6 512
reduce1.json 10 1024
atax2.json 4 1024
std.json 8 512
reduce5.json 10 2048
2DConvul.json 2 256
reduce0.json 10 1024
reduce4.json 10 1024
DotProduct.json 10 8192
reduce2.json 10 2048
reduce3.json 10 1024
mm21.json 2 512
bicg2.json 9 512
reduce6.json 10 512
VectorAdd.json 10 8192
covar.json 4 256
corr.json 5 512
mm22.json 3 1024
The task files are typically stored in the info directory of the PySchedCL package. A few sample task files are provided.

Running the Scheduler Script

One may invoke the scheduler.py script inside the scheduler folder on a taskset specification file inside the info folder and generate makespan results for the same.

$ ./scheduler.py -f info/sample.task -ng 4 -nc 4

                        
Detailed usage for the same is depicted in Scheduler in the Commands section.