|Notice: The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarity and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.|
This paper illustrates a new high level programming language which is important for a novel class of computational devices called Xputers, which are by up to several orders of magnitude more efficient than the von Neumann paradigm of computers. Xputers are as flexible and as universal as computers. The flexibility of Xputers is achieved by using field-programmable logic (interconnect-reprogrammable media) as the essential technology platform. The paper first briefly illustrates the Xputer paradigm as a prerequisite needed to understand the fundamental issues of this new language.
This section summarizes and illustrates the basic machine principles of the Xputer paradigm . Simple algorithm examples will illustrate MoPL-3, a data-procedural programming language.
The Data Sequencer. The hardwired data sequencer features a rich and flexible repertory of scan patterns for moving scan windows along scan paths within memory space. Address sequences needed are generated by hardwired address generators having a powerful repertory of generic address sequences [2, 11]. After having received a Scan Pattern code a data sequencer runs in parallel to the rest of the hardware without stealing memory cycles. This accelerates xputer operation, since it avoids performance degradation by addressing overhead.
Reconfigurable ALU. Xputers have a reconfigurable ALU (rALU), partly using the technology of field-programmable logic. figure 1a shows an example: the rALU of the MoM4 Xputer architecture. The four smart register files called scan windows are explained later. The MoM-4 rALU has a repertory of hardwired operator subnets. Within the field-programmable part of the rALU additional operators needed for a particular application may be compiled by logic synthesis techniques. A global interconnect-programmable structure is the basis of connecting these operators to form one or more problem-specific compound operators, what will be illustrated later by a simple algorithm implementation example.
rALU Configuration is no Microprogramming. Also microprogrammable von Neumann processors have a kind of reconfigurable ALU which, however, is highly bus-oriented. Buses are a major source of overhead , especially in microprogram execution, where buses reach extremely high switching rates at run time. The intension of rALU use in Xputers, however, is compound operator configuration at compile time (downloaded at loading time) as much as possible, so that path switching activities at are minimized and the underlying organizational overhead is pushed into compile time to save the much more precious run time.
Compound Operators. The MoM Xputer may execute expressions (which we call 'compound operators') within a single machine step, whereas computers can execute only a single operation at a time. The rALU may be configured such a way, that one or more sets of parallel data paths form powerful compound operators which need only a single basic clock cycle to be executed. This rALU uses no fixed instruction set: compound operators are user-defined. Since their combinational machine code is loaded directly into the rALU, Xputers do not have a program store nor an instruction sequencer. Instead a data sequencer is used which steps through the data memory to access the operands via register files called data scan windows. Xputers operate data-driven but unlike data flow machines, they feature deterministic principles of operation called data sequencing.
Summary of Xputer Principles. The fundamental operational principles of Xputers are based on data auto sequencing mechanisms with only sparse control, so that Xputers are deterministically data-driven (in contrast to data flow machines, which are indeterministically data-driven by arbitration and thus are not debuggable). Xputer hardware supports some fine granularity parallelism (parallelism below instruction set level: at data path or gate level) in such a way that internal communication mechanisms are more simple than known from parallel computer systems. (For more details about Xputer see [4, 5])
The main extension issue to other programming languages is the data location or data state such, that we simultaneously have two different kinds of location or state. There is the familiar von-Neumann-type control state (current location of control), which e. g. is handled by goto statements referencing control label locations within the program text, or, by other control statements. During execution of Xputer programs such a control state is coexisting with one or more data location states, what will be illustrated subsequently (The control flow notation does not model the underlying Xputer hardware very well, since it has been adopted from C for compatibility reasons to minimize programmer training efforts). A data state corresponds to the actual position of a data counter, which contains a data address. The number of data states depends on the actual hardware. The data sequencer of the Xputer generates the corresponding sequences of data states. The purpose of this extension is the easy programming of sequences of data addresses (Scan Pattern), including code generation for the data sequencer. The familiar notation of these MoPL-3 constructs is easy to learn by the programmer.
JPEG Zig-Zag scan example. The MoPL-3 program code from figure 1 illustrates programming the JPEG Zig-Zag scan pattern (figure 2, for more details about the JPEG Compression Standard see also , , )named JPEGzigzagScan for scanning the array PixMap declared in line (38)
<name_of_scan_pattern> is <maximum_length_of_loop> STEPs <step_vector>.
These statements are a kind of "Transport-Instructions" (transportation of data), which realize in every step of the Scan Pattern a "read-modify-write cycle". The step vector specifies the next data location relative to the current data location (before executing a step of the scan sequence). In the below MoPL-3 program (see figure 1) the data state manipulations (moving scan windows with data space) in figure 3 are needed:
Before the execution of the first Scan Pattern, you have to specify the starting point in the data map. For this purposes we use another data state manipulation, the moveto statement. With this statement you are able to realize absolute jumps of the scan window inside the data map. E.g. see line (65) in figure 1, where the scan window is moved to the upper left corner of the PixMap.
Hardware-supported Escapes. To avoid overhead for efficiency the until clauses are directly supported by MoM hardware features of escape execution . To support the until @ clauses by off-limits escape the address generator provides for each dimension (x, y) two comparators, an upper limit register and a lower limit register.
|Transformation Function||Corresponding Operation|
|reverse||reversed order sequence|
Constant geometry FFT example. The second example illustrates parallelism by running several windows synchronously. It is the constant geometry Fast Fourier Transform (FFT) illustrated by figure 7, with a data map (CGFFT) size of 9 by 16 words (figure 7a). figure 8 shows the MoPL-3 section, declaring the scan patterns and the rALU configuration. The declaration of the Scan Pattern starts with the keyword ScanPattern. 'HLScan' is the outer loop, whereas 'SP1' and 'SP23' are used for inner loops running in parallel (see figure 7c). 'SP1' is used for scan window 'SW1' and 'SP23' is used for two scan windows 'SW2' and 'SW3'. The configuration of the rALU will be specified in two parts: the adjustment declaration (window size declaration, see figure 5) and the declaration of the rALU subnet (see figure 6).
Weights w are stored in every second column, where each second memory location is empty (for regularity reasons). figure 7b shows the window adjustments: the 2-by-2 window no. 1 is the input window reading the operands a and b, and the weight w. Windows no. 2 and 3 are single-word result windows. figure 7b also shows the compound operator and its interconnect to the three windows. This is an example of fine granularity parallelism, as modeled by figure 7e, where several windows communicate with each other through a common rALU. figure 7c illustrates the nested compound Scan Patterns for this example. Note, that with respect to performance this parallelism of scan windows makes sense only, if interleaving memory access is used, which is supported by the regularity of the storage scheme and the Scan Patterns.
 M. Christ: Texas Instruments TMS 320C25; Signalprozessoren 3; Oldenbourg-Verlag 1988
 R. Freeman: "User-Programmable Gate Arrays"; IEEE Spectrum, Dec.1988.
 R. W. Hartenstein, A. G. Hirschbiel, K. Lemmert, K. Schmidt, M. Weber: "A Novel Paradigm of Parallel Computation and its Use to Implement Simple High Performance Hardware"; Int'l Conf. on Information Technology, Tokyo, Japan, Oct. 1990.
 R. W. Hartenstein, A. G. Hirschbiel, K. Lemmert, K. Schmidt, M. Weber, "The Machine Paradigm of Xputers and its Application to Digital Signal Processing", Proc. of 1990 International Conference on Parallel Processing, St. Charles, Oct. 1990.
 R. W. Hartenstein, A. G. Hirschbiel, K. Schmidt, M. Weber: "A Novel ASIC Design Approach based on a New Machine Paradigm"; 16th European Solid-State Circuits Conference, Grenoble, France, Sept. 19-21, 1990.
 R. Hartenstein, G. Koch: "The universal bus considered harmful"; in: (eds.) R. Hartenstein, R. Zaks: Microarchitecture of Computer Systems, North Holland, 1975.
 A. G. Hirschbiel: "A Novel Processor Architecture based on Auto Data Sequencing and Low Level Parallelism"; Ph. D. dissertation, Universität Kaiserslautern, 1991.
 Josef Hoffmann: "Redundanz raus"; Computer Time, Heft 6, 1991.
 L. Matterne et al.: "A Flexible High-performance 2-D Discrete Cosine Transform IC"; Proc.. Int'l Symp on Circuits and Systems, Vol. 2, IEEE New York, 1989
 N. N. (Motorola): DSP 56000 / 56001 Digital Signal Processor User's Manual; Motorola Corp., 1989.
 N. N. (Plessey): Quickgate (Product Overview); Plessey Semiconductors, Swindon, U.K., May, 1990.
 Gregory K. Wallace: "The JPEG Still Picture Compression Standard"; Communications of the ACM, Vol. 34, Nr. 4, April 1991.
 M. Weber: "An Application Development Method for Xputers"; Ph. D. dissertation, Kaiserslautern University, 1990.