Abstract: Integrated Scheduling and Allocation for Synthesis of Structured Data Paths

Integrated Scheduling and Allocation for Synthesis of Structured Data Paths
C. A. Mandal, R. M. Zimmer

Abstract
	Important goals of data path synthesis techniques are to minimize the cost of the data path and to maximize its performance. At this level of design, the performance is measured in terms of the number of time steps in which the data dependency graph of operations is scheduled. There are two important steps in data path synthesis: scheduling of operations and allocations of hardware resources. Scheduling directly determines the performance of the design and places a lower bound on the cost of the data path. Allocation directly determines the cost of the data path and places an upper bound on the performance of the design. In the earlier stages of development the two steps were done separately. Now they are generally done simultaneously to produce better global results. Ultimately all the components in the synthesized data path need to be placed and their interconnections routed. Some of the current data path synthesis techniques produce data paths where the interconnection between the components is random; some other techniques produce bus based structures. Data paths with random interconnects tend to impose a penalty during placement and routing. Bus based data paths, on the other hand, could have more buses than the maximum number of concurrent transfers in any time step. Since buses are generally relatively long distance interconnects, the space taken up by these is under utilized when there are more buses than transfers generally taking place. To account for these considerations we have developed a genetic algorithm to support the synthesis of structured data paths. The aim of the system is to produce designs with a simple and predictable layout structure. These structures conserve on-chip wiring resources. The data path is organized as architectural blocks (A-block). Each A-block has a local functional unit (FU), local memory elements and internal interconnections. Besides the A-blocks there are one or more global memory units and only a few global buses interconnecting the A-blocks and the global memory units. Our scheduling algorithm delivers a performance and hardware cost optimized schedule for a given partial order of operations. The schedule is such that the architecture required to satisfy it is of relatively low cost and satisfies a minimum performance constraint. The scheduling is guided by user specified architectural parameters such as the number of A-blocks and global buses. The scheduling algorithm is versatile enough to handle multi-cycle operations and multiple implementations of an operation. The advantage of our method over existing methods is that random long-distance interconnects between data path elements are avoided. This feature makes the technique especially attractive for the high-level synthesis of designs which are intended to be implemented on reconfigurable architectures and programmable structures such as an FPGA, where interconnect resources are limited. Our method has been applied to standard examples. The resulting data paths compare favorably with other techniques for data path synthesis, in terms of performance and data path cost, while at the same time they have a structure that is advantageous for placement and routing.

crmandal@geocities.com

[Publications list]