FAQ on Xputers - Xputer Lab Kaiserslautern - Reconfigurable Computing with KressArray
[ > | Xputer Lab | Xputer literature | directory | FAQ&FQA on Xputers | FAQ1 on Xputers | FAQ2 on Xputers | FAQ3 on Xputers | Kress Array | FQA on Xputers | History of Xputer Lab| What's new? | Key words on Xputers ]

All about Data Sequencers:


FAQ on Xputers (1)
FAQ on Xputers (2)
FAQ on Xputers (3)

Data Sequencers and Scan Patterns

Xputer operations are driven by data sequencers (instead of an instruction sequencer). Instead of control flow, i. e. a sequence of instruction addresses, Xputers are driven by sequences of data addresses. Since data flow is different from address flow, we do not want to use the term "driven by data flow". So we call a sequence of data locations (addresses) a "scan pattern". Not only for better visualization we prefer a two-dimensional memory organization, where we obtain scan patterns linke the examples shown in fig. 21. To obtain a very rich repertory of scan patterns novel architectural principles of sequencers have been developed at Kaiserslautern [1] [2] (also see fig. 25)..

Fig. 21. A few scan pattern examples:

 < >

Systematic Programming of Scan Patterns

Figure 22 shows a scan pattern in detail, which can be generated by the GAG (generic address generator) having been invented and developed at Kaiserslautern [1] [2]. The figure shows the well known JPEG zigzag scan pattern used for still image (de)compression. This pattern scans a 10-by-10 pixel segment of the image by a kind of 45 degree video scan. The example illustrates the systematic strcture [3], which a surprisingly large naumber of other real world scan patterns have.

A small and simple language called MoPL (Map-oriented Programming Language) [4] has been developed at Kaiserslautern and implemented as an extension of the language C [5]. The JPEG zig zag has been programmed by using three subprograms, one for the blue subpattern in fig. 22, one for the green scan pattern, and a third one for the yellow pattern. The third subsoutine is just a version of the first one, obtained from this by reversal of sequence and by 180 degree rotation. The second pattern is very simple. It is a linear sequence of smallest steps.

< >Fig. 22. Multi media on Xputers: the JPEG zig zag scan pattern:

The Smart Memory Interface (SMIF)

The example scan pattern in fig 22 just scans single word locations. However, the MoM Xputer architecture (Map-oriented Machine) allows by just one address the access to an entire little memory segment, which we call ScanWindow (see fig. 23). Such a scan window, a kind of smart register file, can be adjusted at runtime up to a hardware-dependent maximum size, such as e. g. 15 by 15 words. The pink grid in the background represents a part of the 2-dimensional memory space. Fig. 23 shows several adjustment examples: 1 by 1 word, 3 by 3, 5 by 3, and 3 by 2 words. This figure also illustrates, how a 3-by-2 window is moved by a simple scan pattern.

< >Fig. 23. MoM Xputer Principles: Illustration of scan windows

Instruction Level Parallelism by Compound Scan Patterns

Fig. 24 a illustrates the Xputer implementation of a constant gemoetry Fast Fourier Transform (cgFFT), where the data path reconfigured into the rALU, such as e. e. a Kress Array, reads three words from a 2-by-2 scan window, and writes a single word into a second scan window of size 1 by 1, and into Scan window number 3 of the same size. These three scan windows moving simultaneaously permit data path level parallelism, such that a complex data path communicates with three register files hidden behind three scan windows. So we have shown, that within an Xputer multiple sequencers make sense - incontrast to von Neumann processors. The MoM-3 architecture provides 8 data sequencers, so that even higher parallelism may be implemented, depending on the application.

Fig. 24 b illustrates the total compound scan pattern for moving these three windows in parallel. The green arrows show the triple parallel inner loop. Wenever such a parallel inner loop is finished (after 8 time steps), the outer loop (light blue arraws) moves this inner scan pattern two address units to the right, where it is started again. Finally the three windows end up at the locations marked by red scan windows. This examples shows both, loop nesting and loop parallelism. Such compound scan patterns can be concisely described by the MoPL mini language [4].

< >Fig. 24. Operation illustration of nested and parallel scann patterns

The Generic Address Generator (GAG)

Fig. 25 shows the basic block diagram of the GAG having been developed at Kaiserslautern [3]. For two-diemensional addressing two of them are used, one for the x address, and the other for the y address. The Address Stepper with output A generates simple linear steps, such as shown in fig. 23 (light blue arrows). The Limit Stepper (upper limit) and the Base Stepper (lower limit) is using as a fixed or sliding threshold to limit the range of patterns generated by the Address Stepper. In using fixed limits we obtain scan patterns with orthogonal boundaries. The use of sliding limits, however, permits the generation of a wide variety of partially of fully slanted scan patterns. By direct manipulation of least significant bits by decision data coming from the rALU the MoM also supports very fast data-dependent scan patterns, such as e. g. curve following.

< > Fig. 25. MoM Generic Address Generator (GAG) Basic Structure

Acceleration by GAG Data Sequencer use

Due to continuous rapid progress in von Neumann processor technology the processor to memory communication gap widens more and more, and has reached two orders of magnitude. But GAGs can generate very long address sequences without needing any memory cycles other than for initialization. This means, that the processor to memory communication gap has disppeared for such scan pattern loops. During various performance xeperiments we have found applications, where on a von Neumann basis an address computation loop siphons off up to 94% of total computation time (see section FQA on Xputers). In such a case switching to the Xputer paradigm this yields a speed-up by one and a half order of magnitude, since these 94% do not eat up memory cycles any more. But avoiding addressing overhead is not the only source of acceleration in using Xputers. For other effects see section FQA on Xputers.


[1] R. W. Hartenstein, A. G. Hirschbiel, M. Weber: A Novel Paradigm of Parallel Computation and its Use to Implement Simple High-Performance-Hardware; InfoJapan'90 - International Conference memorating the 30th Annivesary of the Computer Society of Japan, Tokio, Japan, 1990.
[2] R. Hartenstein, A. Hirschbiel, K. Schmidt, M. Weber: A Novel Paradigm of Parallel Computation and its Use to Implement Simple High-Performance-HW, Future Generation Computer Systems 7 91/92, p.181-198, North Holland.
[3] Alexander G. Hirschbiel: A Novel Prcessor Architecture Based on Auto Data Sequencing and Low Level Parallelism; Ph. D. Thesis, University of Kaiserslautern, 1991.
[4] A. Ast, J. Becker, R. W. Hartenstein, R. Kress, H. Reinig, K. Schmidt: Data-procedural Languages for FPL-based Machines; 4rd Int. Workshop In Field Programmable Logic And Applications, FPL'94, Prague, September 7-10, 1994, Lecture Notes in Computer Science, Springer, 1994.
[5] Reiner W. Hartenstein, Juergen Becker: A Two-level Co-Design Framework for data-driven Xputer-based Accelerators; to be published in Proc. of 30th Annual Hawaii Int. Conf. on System Science (HICSS-30), January 7-10, Wailea, Maui, Hawaii, USA, 1997.

You have more questions on Xputers? You have better questions?

If yes, please, inform our webmaster. Our goal is the steady improvement of this list of questions.

[ Xputer Lab | Xputer literature | directory | FAQ&FQA on Xputers | FAQ1 on Xputers | FAQ2 on Xputers | FAQ3 on Xputers | Kress Array | FQA on Xputers | History of Xputer Lab | What's new? | Key words on Xputers ]

© copyright 1996, Universitaet Kaiserslautern, Kaiserslautern, Germany ----- Webmaster