FAQ on Xputers - Xputer Lab Kaiserslautern - Reconfigurable Computing with KressArray
The Kress ALU Array
FPAA - field-programmable ALU Array
Currently most FPGAs (field-programmable gate arrays) offered by commercial vendors like Xilinx and others are used for prototyping glue logic. Researchers have used FPGAs (field-programmable gate arrays) also as accelerators connected to a von Neumann host. These researchers meet at FCCM (FPGAs in Custom Computing Machines), held by IEEE at Napa, CA, annually and attracting more than a hundred people. A few start-ups try to sell FPGA-based accelerator boards with interfaces like SPARC, PC and others.
However, the wide variety of architectures shows, that FPGAs are not a suitable basis of universal reconfigurable accelerator platforms. Remarkable performance results have been obtained only by more or less application-specific FPGA-based architectures. FPGAs are area-inefficient, slow, and do not support computation-intensive applications heavily using arithmetics.
For accelerationg computation-intensive applications FPGAs are not sufficient.
Word-level parallelism is needed (fig. 1) instead
of the bit-level parallelism by FPGAs being available commercially. Rainer
Kress has implemented such a word level FPAA (field-programmable ALU Array)
 . His first generation
FPAA, which he called rDPA (reconfigurable Data Path Array), is an array
of 32 Bit ALUs. Fig. 3 illustrates its
programmability. To obtain area-efficiency only horizontal or vertical
direct neighbor ALUs can be connected directly (fig.
2). Tow links are available for non-multiplexed bidirectional links
(fig. 2). A single bus reaching all ALUs
is needed for configuration programming. This bus may also give some flexibility
by run time usage.
Fig. 4 illustrates Kress Array Technology
mapping. Left side shows an 8 equations problem example. The right side
of the figure shows the routing and placement result obtained for the 1st
generation Kress array (only a single unidirectional link between nearest
neighbors). Routing and placement is done by a simple simulated annealing
optimizer DPSS having been implemented by Rainer Kress and his students
. The DPSS (data path synthesis system) is
fast and needs only a few seconds on a SPARC station.
Simplitity of Software-only Accelerator Implementation
A number of very much promising acceleration
factors have been obtained from Xputer research. Will work stations
become obsolete? The same and even much higher performance may be obtained
by a PC with a universal FPAA-based programmable accelerator - on board
of the processor chip. For a wide variety of applications, like in DSP,
image processing, scientific computing, and multi media applications substantially
to drastically higher speed-up is obtained with a Kress-Array-based accelerator
than by using caches. The highly regular Kress array is suitable for full
custom design, where short wires permit high performance with low power.
Such a design should better to be done by microprocessor designers, since
FPGA designers are not well qualified for such a project.
In this example the utilization is about 70%, since 23 from 32 rALUs
are used (fig. 4). This is a quite
good result, compared to FPGAs available commercially, which often show
a routability hardly higher than 50% or even below. Because of the massive
obtained by Kress Array use even much lower routability results would be
 R. Kress: A fast reconfigurable ALU for
Xputers; Ph. D. dissertation, Kaiserslautern University, 1996