Xputer Lab Kaiserslautern - Reconfigurable Computing with KressArray
Xputer Lab's H/S Co-Design Page

CoDe-X: Two-level Hardware/Software Co-Design Framework for Xputer-based Accelerators

In Kaiserslautern is the Hardware-Software-Co-Design Framework CoDe­X (Co-Design for Xputers, see figure 1) [8], [9], [10] under development, which accepts programs written in a (optional available) data-procedural extension of ANSI C as input. Moreover optional generic library-functions for computation-intensive parts of applications can be integrated as C-function calls into the system specifications. Those calls will be transformed into efficient sequential and structural Xputer-Code and offers so to the user the possibility to control the partitioning process in the first level. CoDe-X performs a two-level partitioning process. The first level analyzes the trade-off between host and a reconfigurable data-driven hardware-accelerator, the so called Map-oriented Machine (MoM), which correspons to the actual Xputer prototype [1], [2], [6]. First a Pre-Processor-Program extracts these program parts, which can be executed by the MoM. Dynamical programm-constructs (such as pointers or recursive functionc/procedures) will be not executed onto the MoM. The extracted program parts will be divided in MoM-specific basis blocks (Tasks), which defines the granularity of the partitioning.

Figure 1. The CoDe-X Hardware/Software Co-Design Framework

A profiling-unit estimates the corresponding execution times of these tasks (optimized and non-optimized versions), for an implementation onto the host-computer as well as onto the MoM. One possibility of estimating the host-execution time of one task is the statical code analysis for the viewed type of processor. Therefore "Branch Prediction Techniken" are applied and a deterministic behaviour of the hardware components as well as the operating system is assumed. Another possibility is the dynamical analysis computedby a profiler, which analyzes the code for expected data inputs. The MoM-execution time of one task will be computed from implementation informations given by the second partitioning level. The Data Path Synthesi System (DPSS) [4] determines the execution time of one rALU activation (structural code). The X-C compiler [3] generates the number of rALU activations. Using these informations the profiling-unit can compute the execution time of the complete task. The goal of an iterative partitioning process, which is based on simulated annealing, is to minimize the overall execution time of an application. Therefore different task partitions (task allocations) between host and MoM are analyzed. The overhead caused bycommunication, synchronization and possible reconfigurations of the MoM during run-time are considered in the cost function, which controls the simulated annealing algorithm. The second partitioning level performs an optimized resource-parameter-driven utilization of the available MoM hardware. In this step the X-C compiler (see figure 1) realizes the paradigm shift from procedural-driven von Neumann paradigm to data-driven Xputer paradigm. Thus the the X-C Compiler generates sequential code (software partl) for programming the generic address generators (`Data Sequencer', DS), and structural code (Hardware part) for the (re-) configuration of the rALU, as well as a storage scheme for the two-dimensional organized Xputer-memory (see figure 1). Xputer-based accelerators, which are suitable for computation-intensive parts of applications, can also be integrated into embedded systems [7]. .


[1] R. W. Hartenstein, A. G. Hirschbiel, M.Weber: MoM - Map Oriented Machine; in T. Ambler, P. Agraval, W. Moore (eds.): Hardware Accelerators for Electrical CAD, Adam Hilger, 1988, also: International Workshop on Hardware Accelerators, Oxford, September 30 - October 2, 1987
[2] R. W. Hartenstein, A. G. Hirschbiel, M. Riedmüller, K. Schmidt, M. Weber: A Novel ASIC Design Approach Based on a New Machine Paradigm; IEEE Journal of Solid-State Circuits, Vol. 26, No. 7, July 1991
[3] Reiner W. Hartenstein, Karin Schmidt: Parallelizing Compilation for a Novel Data-Parallel Architecture; J. P. Gray, F. Naghdy (Eds.), PCAT-94, Parallel Computing: Technology and Practice, Wollongong, Australia, pp. 126-137, Nov. 1994
[4] R. W. Hartenstein, R. Kress: A Datapath Synthesis System for the Reconfigurable Datapath Architecture; Asia and South Pacific Design Automation Conference, ASP-DAC¹95, Nippon Convention Center, Makuhari, Chiba, Japan, Aug. 29 - Sept. 1, 1995
[5] R. W. Hartenstein, R. Kress: A Scalable, Parallel, and Reconfigurable Datapath Architecture; Sixth International Symposium on IC Technology, Systems & Applications, ISIC¹95, Singapore, Sept. 6-8, 1995
[6] Reiner W. Hartenstein, Jürgen Becker, Rainer Kress, Helmut Reinig: High-Performance Computing Using a Reconfigurable Accelerator; CPE Journal, Special Issue of Concurrency: Practice and Experience, John Wiley & Sons Ltd., 1996
[7] Reiner W. Hartenstein, Jürgen Becker, Rainer Kress: An Embedded Accelerator for Real Time Image Processing; 8th EUROMICRO Workshop on Real Time Systems, L¹Aquila, Italy, June 1996
[8] Reiner W. Hartenstein, Jürgen Becker, Michael Herz, Rainer Kress, Ulrich Nageldinger: A Parallelizing Programming Environment for Embedded Xputer-based Accelerators; High Performance Computing Symposium Œ96, Ottawa, Canada, June 1996
[9] Reiner W. Hartenstein, Jürgen Becker, Michael Herz, Rainer Kress, Ulrich Nageldinger: A Partitioning Programming Environment for a Novel Parallel Architecture; 10th International Parallel Processing Symposium (IPPS), Honolulu, Hawaii, April 1996
[10] Reiner W. Hartenstein, Jürgen Becker, Rainer Kress: Two-Level Hardware/Software Partitioning Using CoDe-X; Int. IEEE Symp. on Engineering of Computer Based Systems (ECBS), Friedrichshafen, Germany, March 1996

© Copyright 1996, University of Kaiserslautern, Kaiserslautern, Germany