FAQ on Xputers - Xputer Lab Kaiserslautern - Reconfigurable Computing with KressArray
[ > | Xputer Lab | Xputer literature | directory | FAQ&FQA on Xputers | FAQ1 on Xputers | FAQ2 on Xputers | FAQ3 on Xputers | Data Sequencers | FQA on Xputers | History of Xputer Lab | What's new? | Key words on Xputers ]
 

The Kress ALU Array

Contents


FPAA - field-programmable ALU Array

Currently most FPGAs (field-programmable gate arrays) offered by commercial vendors like Xilinx and others are used for prototyping glue logic. Researchers have used FPGAs (field-programmable gate arrays) also as accelerators connected to a von Neumann host. These researchers meet at FCCM (FPGAs in Custom Computing Machines), held by IEEE at Napa, CA, annually and attracting more than a hundred people. A few start-ups try to sell FPGA-based accelerator boards with interfaces like SPARC, PC and others.

However, the wide variety of architectures shows, that FPGAs are not a suitable basis of universal reconfigurable accelerator platforms. Remarkable performance results have been obtained only by more or less application-specific FPGA-based architectures. FPGAs are area-inefficient, slow, and do not support computation-intensive applications heavily using arithmetics.

Ð >Fig. 11. From FPGA to FPAA: we need reconfigurable ALU Arrays


FPGA draw-backs

  • by far too area-inefficient
  • very poor cost/performance ratio for highly computing-intensive applications
  • used for random logic only
  • too much fine grain
  • much too slow

The Kress Array

For accelerationg computation-intensive applications FPGAs are not sufficient. Word-level parallelism is needed (fig. 1) instead of the bit-level parallelism by FPGAs being available commercially. Rainer Kress has implemented such a word level FPAA (field-programmable ALU Array) [1] [2]. His first generation FPAA, which he called rDPA (reconfigurable Data Path Array), is an array of 32 Bit ALUs. Fig. 3 illustrates its programmability. To obtain area-efficiency only horizontal or vertical direct neighbor ALUs can be connected directly (fig. 2). Tow links are available for non-multiplexed bidirectional links (fig. 2). A single bus reaching all ALUs is needed for configuration programming. This bus may also give some flexibility by run time usage.

< >Fig. 12. Principles of 2nd Generation Kress Array:





  • instruction level parallelism
  • transparently scalable
  • fast routing and placement (seconds only)
  • dynamically and partially reconfigurable (microseconds)
  • suitable for full custom design
  • on microprocessor chip: much higher acceleration than by caches
  • on microprocessor chip: fast and low power by full custom design
  • acceleration by massive run time to compile time migration

Kress Array Technology Mapping

Fig. 4 illustrates Kress Array Technology mapping. Left side shows an 8 equations problem example. The right side of the figure shows the routing and placement result obtained for the 1st generation Kress array (only a single unidirectional link between nearest neighbors). Routing and placement is done by a simple simulated annealing optimizer DPSS having been implemented by Rainer Kress and his students [1]. The DPSS (data path synthesis system) is fast and needs only a few seconds on a SPARC station.

DPSS has been integrated into an application development support environment Co-De-X for the Kress Machine, having been dimplemented for the Kress Machine (a MoM Xputer architecture). Co-De-X [4] accepts problem descriptions C-X language source text. C-X is an extension of the programming language C. The Co-De-X environment includes 2 levels of automatic partitioning: (1) host/accelerator partitioning based on ressource parameters, and, (2) (data-)procedural/structural partitioning for FPAA configuration and data sequencer programming.

< >Fig. 13. 2nd Generation Kress Array Programmability (Bus not shown):




  • PE (reconfigurable Processing Element) contains: rALU, small register file, routing resources
  • rALU operations: arithmetic, relational, logic, special, xfer-only
  • PE use: operations, routing only, routing and operations
  • routing-only use adds flexibility
  • pipe-like asynchronous inter-PE communication
  • smart interface for data scheduling (data streams entering and leaving the FPAA)

Simplitity of Software-only Accelerator Implementation

A number of very much promising acceleration factors have been obtained from Xputer research. Will work stations become obsolete? The same and even much higher performance may be obtained by a PC with a universal FPAA-based programmable accelerator - on board of the processor chip. For a wide variety of applications, like in DSP, image processing, scientific computing, and multi media applications substantially to drastically higher speed-up is obtained with a Kress-Array-based accelerator than by using caches. The highly regular Kress array is suitable for full custom design, where short wires permit high performance with low power. Such a design should better to be done by microprocessor designers, since FPGA designers are not well qualified for such a project.

< ðFig. 14. Technology Mapping Example of 1st Generation Kress Array 

8eq Eight Equations Example 8eq Example mapped onto rDPA (recongifurable Data Path Arrtay)
y10 := a0 * (b0 + 2 * c0);

y20 := 5 * d0 + e0 + (f0 + b0):

y30 := g0 * (h0 + 2 * e0);

y40 := (5 * d0 + e0) * f0;

y11 := a1 * (y10 + 2 * c1);

y21 := 5 * y20 + e1 + (f1 + y10);

y31 ;= y30 * (y40 + 2 * e1);

y41 := (5 * y20 + e1) * f1;

Conclusions

In this example the utilization is about 70%, since 23 from 32 rALUs are used (fig. 4). This is a quite good result, compared to FPGAs available commercially, which often show a routability hardly higher than 50% or even below. Because of the massive speed-up obtained by Kress Array use even much lower routability results would be tolarable.

A highly promising very powerful novel technology platform for software-only accelerator implementation has been introduced [3], along with a new compilation method accepting C sources [4].

FPAA Literature

[1] R. Kress: A fast reconfigurable ALU for Xputers; Ph. D. dissertation, Kaiserslautern University, 1996
[2] R. Hartenstein, R. Kress: A Datapath Synthesis System for the Reconfigurable Datapath Architecture; Proc. Asia and South Pacific Design Automation Conference (ASP-DAC'95), Makuhari, Chiba, Japan, Aug. 29 - Sept 1, 1995
[3] R. Hartenstein, J. Becker, M. Herz, U. Nageldinger: A General Approach in System Design Integrating Reconfigurable Acclerators; IEEE Int'l Symp. on Innovative Systems (ISIS'96), Austin, Texas, 9-11 Oct. 1996
[4] R. Hartenstein, J. Becker: A Two-level Co-Design Framework for data-driven Xputer-based Accelerators; Proc. 30th Hawaii Int'l Conf. on Systems Sciences (HICSS-30), Wailea, Maui, Hawaii, Jan. 7 - 10, 1997


You have more questions on Xputers? You have better questions?

If yes, please, inform our webmaster. Our goal is the steady improvement of this list of questions.

[ Xputer Lab | Xputer literature | directory | FAQ&FQA on Xputers | FAQ1 on Xputers | FAQ2 on Xputers | FAQ3 on Xputers | Data Sequencers | FQA on Xputers | History of Xputer Lab | What's new? | Key words on Xputers | Call for Partners - New Horizons in R&D ]



© Copyright 1996, Universitaet Kaiserslautern, Kaiserslautern, Germany ---- Webmaster