In Kaiserslautern is the Hardware-Software-Co-Design Framework CoDeX (Co-Design for Xputers, see figure 1) [8], [9], [10] under development, which accepts programs written in a (optional available) data-procedural extension of ANSI C as input. Moreover optional generic library-functions for computation-intensive parts of applications can be integrated as C-function calls into the system specifications. Those calls will be transformed into efficient sequential and structural Xputer-Code and offers so to the user the possibility to control the partitioning process in the first level. CoDe-X performs a two-level partitioning process. The first level analyzes the trade-off between host and a reconfigurable data-driven hardware-accelerator, the so called Map-oriented Machine (MoM), which correspons to the actual Xputer prototype [1], [2], [6]. First a Pre-Processor-Program extracts these program parts, which can be executed by the MoM. Dynamical programm-constructs (such as pointers or recursive functionc/procedures) will be not executed onto the MoM. The extracted program parts will be divided in MoM-specific basis blocks (Tasks), which defines the granularity of the partitioning.
Figure 1. The CoDe-X Hardware/Software Co-Design Framework
A profiling-unit estimates the corresponding execution times of these tasks (optimized and non-optimized versions), for an implementation onto the host-computer as well as onto the MoM. One possibility of estimating the host-execution time of one task is the statical code analysis for the viewed type of processor. Therefore "Branch Prediction Techniken" are applied and a deterministic behaviour of the hardware components as well as the operating system is assumed. Another possibility is the dynamical analysis computedby a profiler, which analyzes the code for expected data inputs. The MoM-execution time of one task will be computed from implementation informations given by the second partitioning level. The Data Path Synthesi System (DPSS) [4] determines the execution time of one rALU activation (structural code). The X-C compiler [3] generates the number of rALU activations. Using these informations the profiling-unit can compute the execution time of the complete task. The goal of an iterative partitioning process, which is based on simulated annealing, is to minimize the overall execution time of an application. Therefore different task partitions (task allocations) between host and MoM are analyzed. The overhead caused bycommunication, synchronization and possible reconfigurations of the MoM during run-time are considered in the cost function, which controls the simulated annealing algorithm. The second partitioning level performs an optimized resource-parameter-driven utilization of the available MoM hardware. In this step the X-C compiler (see figure 1) realizes the paradigm shift from procedural-driven von Neumann paradigm to data-driven Xputer paradigm. Thus the the X-C Compiler generates sequential code (software partl) for programming the generic address generators (`Data Sequencer', DS), and structural code (Hardware part) for the (re-) configuration of the rALU, as well as a storage scheme for the two-dimensional organized Xputer-memory (see figure 1). Xputer-based accelerators, which are suitable for computation-intensive parts of applications, can also be integrated into embedded systems [7]. .