HICSS 30 Architecture Track
Wednesday January 8, 1997
University of California at Los Angeles
Configurable computing systems combine high density
FPGAs along with processors to achieve the best of both worlds:
customized digital circuits accelerators which are responsive
to dynamic events. A number of different models have been proposed
for configurable computing, including off-board data pumps, co-processors,
configurable function units on the processor datapath and configurable
datapaths. Each approach provides a different set of strengths
and weaknesses, along with a different model of computation. The
task force on Configurable Computing will be discussing the critical
impediments which are currently limiting the use of these systems:
computing models (architectural abstractions), runtime support
for optimization and reconfiguration, driving applications and
FPGA technology (density, configuration time and clock rates).
Further information is available from Bill Mangione-Smith (email@example.com)
or at http://www.icsl.ucla.edu/~billms/hicss97.
Configurable computing is an area of active research
that has sprung up over the last several years. By combining aspects
of traditional computing, such as high performance microprocessor
and commodity memory devices, with programmable hardware devices,
configurable computing attempts to gain the benefits of both adaptive
software and optimized hardware. Because the field is young it
remains ill-defined and often either misunderstood or misrepresented.
Fortunately, there currently exist a small number of applications
which show significant improvement through the use of configurable
computing technology, and have sparked much of the interest in
This brief paper will touch on some of the different types of Configurable computing systems that have been either proposed or developed, consider applications that seem to be well suited to this approach, and discuss the open issues in tool design.
Thus far the research community seems to agree on
exactly one characteristic which defines Configurable computing:
programmable hardware. In most cases field programmable gate arrays
(FPGAs) are used to provide this capability, though in some cases
programmable switches are used to configure interconnect hardware.
A number of different application scenarios have been proposed for Configurable computing, and one way to characterize the differences is to consider the rate of configuration. The key characteristic is that some amount of information is semi-static, i.e. changing frequently enough to require programmable hardware but slowly enough to provide the opportunity to improve performance through customized hardware.
Situation based configuration involves hardware changes
at a relatively slow rate, for example on the scale of hours,
days or weeks. This classification trivially subsumes the case
where the new configuration provides a bug fix. The examples below
illustrate some of the specific applications of situational configuration
that have been proposed by the research community:
While the first scenario involves algorithmic specialization, the second deals with data-specific optimization. The author has been involved with an effort to produce a high performance system for template based Automatic Target Recognition. This effort involves generating highly customized adder trees which are used for executing two dimensional image correlation. The image source is generated from a Synthetic Aperture Radar (SAR), and produces images representing radar reflection and absorption rather than a direct optical image. The SAR templates are approximately 25% populated, and so the customized adder trees are better than a general purpose correlater because they require less hardware and have shorter critical paths. By using configurable hardware to implement multiple data-specific adder trees in sequence, the system is faster and consumes less power than either a general purpose programmable part or a general purpose correlater. While it is true that the same technique could be used to produce a sequence of ASICs which are each customized to a set of templates, the resulting system would require far too many ASICs to be practical.
A second use for configurable computing is through straight forward time sharing of the programmable hardware. If a particular application lends itself to a pipelined hardware implementation it may be possible to implement each phase on the programmable part in sequence. An FPGA from National Semiconductor was used at UCLA to achieve a four-fold reduction in hardware through round robin reconfiguration for a video image transmission system. The device managed data acquisition, image quantization, wavelet coding and finally modem transfer. A more ambitious use of the same part has appeared in the DISC work at BYU. Hutchings and his students compile C code fragments to assembly language for the Sun SPARC as well as activations to an FPGA accelerator. The FPGA circuitry is able to tell if the desired circuit is currently loaded, and executes a demand-driven fetch and reload in the case of a fault.
The most ambitious approach suggested thus far involves dynamic generation of FPGA circuits. This technique has been proposed for both evolutionary systems that exhibit emergent behavior, as well as more traditional applications involving parameterized macro libraries and some dynamic placement. It seems likely that dynamic circuit generation will provide one of the most compelling uses of the full set of properties available in FPGAs, simply because it inherently excludes the use of even a large number of ASICS as an alternative. However, it has not yet been shown just how broadly this approach can be applied.
Thus far, the basic unit of configuration typically
used for configurable computing systems has been the commercial
offerings in FPGAs. These devices provide simple processing elements,
each of which can implement the equivalent of tens of logic gates.
Configurable computing with these devices has been focused on
CAD development tools, including schematic capture, VHDL and Verilog,
and netlist generation and simulation
A number of researchers have proposed using more powerful elements in the FPGA array. In particular, the introduction of multipliers has attracted a great deal of attention, with the hope of developing high performance configurable computers for more traditional signal processing applications. Other approaches suggested include ALUs with wider data paths as well as larger memories.
Researchers at MIT, Virginia Tech and the University of Washington have suggested placing some of the control flow circuitry in the embedded processing elements. This paradigm represents a fundamental shift away from the circuit view of traditional FPGAs, and moves the devices closer to previous work in systolic array processors. One fundamental problem with this approach is that it cannot completely leverage either the compilation work found in traditional processors or the synthesis tools used for ASIC and FPGA development. On the other hand, it holds out the hope of being more efficient than either approach once sufficient models of computing and tools are developed.
The final class of resource which can be configured
in a configurable computer involves the routing resources. At
an important level the existing FPGA architectures already provide
this capability: the interconnect between arrays of processing
elements is switchable under the control of SRAM cells (typically).
However, some researchers have proposed building highly capable
reconfigurable interconnect which can adaptively route between
fixed (though high performance) logic blocks. This reasoning fits
well with the effort to increase support for aggressive DSP applications
on these devices. One could imagine a linear array of processing
elements which could be subdivided or rerouted according to the
time-varying system requirements.
Research projects are currently underway to develop effective device architectures supporting each of the forms of configuration mentioned above. Thus far, the vast majority of commercial effort has been invested towards improving traditional FPGA devices along one of the standard paths: density, speed, power and cost.
Configurable computing architectures will never prove good targets for general purpose computing, in part because of the huge investment in special purpose hardware which is designed to support general software structures. It is worthwhile considering what sort of applications do in fact map well to the existing models of configurable computing.
Because of the regular structure of simple processing elements, configurable computers lend themselves to applications that exhibit embarrassingly large amounts of parallelism. Examples include signal processing and matrix operations. FPGA based system are particularly amenable to bit-wide operations, though bit-serial datapaths have proven effective. Coarser grained structures, such as some of the enhanced processing elements discussed above, lend themselves to wider data paths. These applications also map well to ASIC technology, and so other application characteristics are needed to justify the use of less efficient and more expensive configurable computing devices.
On the other hand, some applications exhibit large
opportunities for data-dependent optimization. Examples already
mentioned include automatic target recognition for templates which
are sparsely populated, and data encryption with pseudo-static
encryption keys. These systems do exhibit large amounts of parallelism,
but they are mostly interesting because of the work that can be
avoided by leveraging runtime data information. This feature makes
it impossible for an ASIC based approach to ever provide comparable
performance with the comparable resources, even though the underlying
problem formulation involves circuit design and a similar set
of CAD tools are used for development.
Based on experience, I am comfortable drawing the
Configurable computing has captured the imagination
of many architects who want the performance of application-specific
hardware combined with the flexibility of general-purpose computers.
Despite the efforts of many research groups over the past decade,
successes have been rare: Configurable computers so far exhibit
poor cost performance for most common applications. To make things
worse, configurable computers are notoriously hard to program.
Commercial FPGAs are not well-suited to most applications.
These FPGAs are necessarily very fine-grained so they can be used
to implement arbitrary circuits, but the overhead of this generality
exacts a very high price in density and performance. Compared
to general purpose processors (including DSPs), which use very
optimized function units that operate in bit-parallel fashion
on long data words, FPGAs are very inefficient for performing
ordinary arithmetic and logical operations. FPGA-based computing
has the advantage only when it comes to complex bit-oriented computations
like count-ones, find-first-one or complicated masking and filtering.
Because FPGAs are so fine-grained and general purpose,
programming an FPGA-based configurable computer is akin to designing
an ASIC. The programmer either uses synthesis tools that deliver
poor density and performance, or designs the circuit manually
which requires both intimate knowledge of the configurable architecture
and substantial design time. Neither alternative is attractive,
especially if the computation itself is relatively uncomplicated
and can be described in a few lines of C code.
We are certainly willing to pay some price for configurability
but the question is how large a price will we put up with? The
current cost-performance price of configurable computers is a
factor of about 100x, and the programming price in terms of expertise
and time is many orders of magnitude greater. This combined price
is much too high for most applications and most users.
Conclusion 1 - FPGA-based configurable computers
will be used only in niche applications where cost is of little
concern or that require substantial bit-level data computation.
New architectures like the Xilinx 6200 will provide some improvement
for some applications, but expecting dramatic improvements is
unrealistic. Progress in making configurable computers easier
to program will be disappointing.
Conclusion 2 - New configurable computers will appear
that are based on new coarse-grained architectures more suitable
for conventional arithmetic-intensive tasks. Research examples
include the Matrix (MIT) and RaPiD (University of Washington)
Conclusion 3 - Progress in programming configurable
computers will require a coordination between the architecture
model of computation, the application domain, and the programming
language. One need only look to the successful silicon compilers
that have been developed for DSP applications such as Cathedral
(IMEC) and Lager (Berkeley) to see the advantage of this approach.
Although traditional FPGA-based architectures can benefit from
this methodology, the newer coarse-grained architectures can take
full advantage of it from the ground up.
Conclusion 4 - Systems will appear that incorporate
dynamically programmable components in new and interesting ways
that allow conventional computing to be blended with application-specific
computing at a fine-grained level. Initial attempts include PRISC
(Harvard), DISC (BYU) and Brass (Berkeley).
In summary, future progress in configurable computers
will result not from continued research along the same FPGA-based
path, but from a diversity of approaches including more coarse-grained
configurable architectures and constrained programming models
that allow more powerful compilation techniques.
Substantial effort has gone into the research and
development of hardware "media" that permit varying
hardware images to the user, from fine-grained to coarse-grained
levels. At its finest granularity, these media permit reconfiguration
at the levels of single bits, and can span the entire spectrum
to systems that reconfigure at the levels of individual physical
or virtual processors. While the term itself is used pervasively
and offers exciting opportunities in all of the above contexts,
our focus in this discussion is on the devices at the finer level
of granularity. In particular, we are concerned with devices that
form a basis for
The hardware media that constitute the clay from
which different configurations can be molded easily and dynamically
range from DPGA technologies at the device level, to reconfigurable
meshes at the system level.
Reconfigurable hardware of this sort has found a
variety of interesting applications typically in the context of
application requiring special throughput, and in particular, well-defined
timing behavior. Several applications such as ATR, and
multimedia applications using the MPEG standard, all use reconfigurable
"glue" at critical points in the computational path.
While reconfigurable hardware remains a very desirable choice
in the context of all these application domains, the potential
rapid evolution (revolution) is yet to come. A primary barrier
in this regard is the absence of programming tools and software
support to eventually compile algorithms implemented in standard
and widely-used languages such as C onto the hardware platforms.
Current support through VHDL based synthesis does
not come close to providing the level of support that is eventually
desirable, i.e., the level of the specification is too close to
concerns of hardware. In contrast, application development tends
to be much more centered around algorithmic specifications at
much higher levels. In fact, currently, even acceptable "models"
of a range of reconfigurable media that a compiler can target
are lacking. The problem is further compounded by the fact that
the compiler must also optimize to take advantage of the potential
for reconfiguration, as well as the parallelism that these platforms
have to offer.
There are some proposals aimed at targeting public
domain compilers such as GCC at restricted forms of reconfigurable
architectures such as the transport-triggered approach. While
transport-triggering raises some interesting architectural opportunities,
approaches such as these raise two important concerns. First,
transport-triggering offers a limited view of reconfigurability,
almost constrained by the type of machines that canonical optimizing
compiler technology such as that embodied in GCC can try to exploit;
it is not even clear that conventional compiler technologies and
intermediate representations are the adequate in dealing with
this situation. Consequently, if we were to consider more ambitions
forms of reconfigurability, the nature and concerns of optimizing
compiler technology needs substantial research and innovation.
This entire challenge is compounded substantially if we add the
need for preserving timing expectations in the application, motivated
by the rich range of embedded applications in the context
of which reconfigurable hardware can play an important role. The
depth, breadth and need for possible innovations is very great.
In this short position paper, we will highlight some of the more
crucial aspects and issues while deferring a more detailed discussion
to the workshop.
Optimizing Programming Tools and Compilation Support:
Reconfigurable platforms offer several novel opportunities in
terms of target hardware including varying word size, degree and
type of instruction level parallelism and communication
topology. To design efficient optimizing compilers to target the
"processors" with these properties, we need to revisit
and quite possibly rethink
It will be very useful to try and leverage the wealth
of knowledge available in the context of program development for
conventional platforms, but it will be crucial to try and understand
the different needs of reconfigurable hardware and their impact
on eventual performance.
Real-time and Embedded System Support:
This area is rapidly growing and while having substantial potential
for using reconfigurable hardware, also has a need for programming
support in conventional settings as well. This situation is especially
true in the context of targeting processors with ILP. Conventional
compile-time optimizations restructure the program quite dramatically
and are not geared to cope with timing constraints in the applications
being compiled. Some crucial needs that are also true in the context
of developing embedded applications using reconfigurable targets
The real-time compilation technologies and ILP
(ReaCT-ILP) project at NYU that this author directs, is addressing
several of these issues. Algorithm specific methods for mapping
key applications onto reconfigurable platforms such as meshes
have been developed by the USC group with which we are interacting
on the issue of developing models and IRs at which our compilation
technologies can be targeted. They are presenting an independent
position paper at this workshop, describing these modeling aspects
and related innovations.
Configurable computing ideas are being explored to
design high performance systems for many applications. Devices
which provide partial reconfigurability of combinational logic
are now in the market. Future devices which provide dynamic reconfigurability
of both combinational logic and interconnection network based
on intermediate results promise enormous computational power.
To realize the inherent potential of this technology
we need algorithmic techniques and tools which exploit the hardware
in a non-trivial manner. Characteristics of future devices also
need to be explored. Current approaches to design configurable
solutions are largely based on "Logic synthesis" in
which an HDL description is statically compiled onto hardware.
Using such an automated synthesis approach is not amenable to
designing solutions which analyze the run-time behavior of applications
and exploit dynamic reconfiguration.
Collapsing the numerous levels of abstraction in
the automated synthesis approach will provide a new paradigm for
designing configurable computing solutions. We propose to achieve
this by using a computational model of configurable computing
devices which facilitates an algorithm synthesis approach as opposed
to the logic synthesis approach. In our approach the user is exposed
to the underlying device characteristics which will allow the
user to make use of the dynamic reconfiguration features. The
computational model not only allows the user to implement algorithms
in a natural manner but also permits analysis of the runtime behavior.
We will first illustrate some earlier models proposed
by our group. These models of parallel computation which permit
dynamic reconfiguration of the interconnection network on a per-instruction
basis provide distributed control using local intermediate computational
results. These models provide interesting ideas as to the directions
in which devices should evolve. Currently, we are also developing
practical models which will consider the cost of reconfiguration,
partial reconfigurability and performance in light of these issues.
These will be discussed in the presentation. Variants of these
models are also used for compilation by NYU researchers. These
issues are discussed in a separate presentation at the workshop.
This work is supported by DARPA under contract DABT63-96-C-0049.
General-purpose computing devices and systems are
commodity building blocks which can be adapted to solve any number
of computational tasks. We adapt these general-purpose devices
by feeding them a series of control bits according to our computational
needs. We have traditionally called these bits instructions, as
they instruct the programmable silicon on how to function.
While all general-purpose computing devices have
instructions, distinct architectures treat them differently --
and it is precisely the management of device instructions which
differentiates various general-purpose computer architectures.
When architecting a general-purpose device, we must make decisions
on issues such as:
Conventional programmable processors, such as microprocessors,
As a consequence these devices are efficient on wide
word data and irregular tasks -- i.e. tasks which need to perform
a large number of distinct operations on each datapath processing
element. On tasks with small data, the active computing resources
are underutilized, wasting computing potential. On very regular
computational tasks, the on-chip space to hold a large sequence
of instructions goes largely unused.
In contrast, conventional configurable devices, such
as FPGAs, have
As a consequence these devices are efficient on bit-level
data and regular tasks -- i.e. tasks which need to repeatedly
perform the same collection of operations on data from cycle to
cycle. On tasks with large data elements, these fine-grain devices
pay excessive area for interconnect and instruction storage versus
a coarser-grain device. On very irregular computational tasks,
active computing elements are underutilized -- either the array
holds all sub-computations required by a task, but only a small
subset of the array elements are used at any point in time, or
the array holds only the sub-computation needed at each point
in time, but must sit idle for long periods of time between computational
sub-tasks while the next subtask's array instructions are being
Unfortunately, most real computations are neither
purely regular nor irregular, and real computations do not work
on data elements of a single data size. Typical computer programs
spend most of their time in a very small portion of the code.
In the kernel where most of the computational time is spent, the
same computation is heavily repeated making it very regular. The
rest of the code is used infrequently making it irregular. Further,
in systems, a general-purpose computational device is typically
called upon to run many applications with differing requirements
for datapath size, regularity, and control streams. This broad
range of requirements makes it difficult, if not impossible, to
achieve robust and efficient performance across entire applications
or application sets by selecting a single computational device
with the extremes of today's conventional architectures.
Potential solutions to this dilemma reside in architectures
which tightly couple elements of both extremes and which draw
from the broad architectural space left open in the middle.
Multiple context FPGAs, such as MIT's DPGA, provide
one such intermediate in this architectural space. The DPGA retains
the bit-level granularity of FPGAs, but instead of holding a single
instruction per active array element, the DPGA stores several
instructions per array element. The memory necessary to hold each
instruction, is small compared to the area comprising the array
element and interconnect which the instruction controls. Consequently,
adding a small number of on-chip instructions does not substantially
increase die size. While the instructions are small, their size
is not trivial -- supporting a large number of instructions per
array element (e.g. tens to hundreds) would cause a substantial
increase in die area decreasing the device efficiency on regular
Multiple context components with moderate datapaths
also come down in the intermediate architectural space. Pilkington's
VDSP has an 8-bit datapath and space for 4 instruction per datapath
element. UC Berkeley's PADDI and PADDI-II have a 16-bit datapath
and 8 instruction per datapath element. Both of these architectures
were originally developed for signal processing applications and
can handle semi-regular tasks on small datapaths very efficiently.
Here, too, the instructions are small compared to the active datapath
computing elements so including 4-8 instructions per datapath
substantially increases device efficiency on irregular applications
with minimal impact on die area.
While intermediate architectures such as these are
often superior to the conventional extremes of processor and FPGAs,
any architecture with a fixed datapath width, on-chip instruction
depth, and instruction distribution area will always be less efficient
than the architecture whose datapath width, local instruction
depth, and instruction distribution bandwidth exactly matches
the needs of a particular application. Unfortunately, since the
space of allocations is large and the requirements change from
application to application, it will never make sense to produce
every such architecture. Flexible, post fabrication, assembly
of datapaths and assignment of routing channels and memories to
instruction distribution enables a single component to deploy
its resources efficiently, allowing the device to realize the
architecture best suited for each application. This is the approach
taken by MIT's MATRIX component.
Since many tasks have a mix of irregular and regular
computing tasks, a hybrid architecture which tightly couples arrays
of mixed datapath sizes and instruction depths along with flexible
control can often provided the most robust performance across
the entire application. In the simplest case, such an architecture
might couple an FPGA array into a conventional processor, allocating
the regular, fine-grained tasks to the array, and the irregular,
coarse-grained tasks to the conventional processor. Such coupled
architectures are now being studied by several groups.
In summary, we see that conventional, general-purpose
device architectures, both microprocessors and FPGAs, live at
extreme ends of a rich architectural space. As feature sizes shrink
and the available computing die real-estate grows, microprocessors
have traditionally gone to wider datapaths and deeper instruction
and data caches, while FPGAs have maintained single-bit granularity
and a single instruction per array element. This trend has widened
the space between the two architectural extremes, and accentuated
the realm where each is efficient. A more effective use of the
silicon area now becoming available for the construction of general-purpose
computing components lies in the space between these extremes.
In this space, we see the emergence of intermediate architectures,
architectures with flexible resource allocation, and architectures
which mix components from multiple points in the space. Both processors
and FPGAs stand to learn from each other's strengths. In processor
design, we will learn that not all instructions need to change
on every cycle, allowing us to increase the computational work
done per cycle without correspondingly increasing on-chip instruction
memory area or instruction distribution bandwidth. In reconfigurable
device design, we will learn that a single instruction per datapath
is limiting and that a few additional instructions are inexpensive,
allowing the devices to cope with a wider range of computational
It is often suggested that configurable computing
represents a new computational middle ground that fills the existing
void between conventional microprocessors and ASICs. This point
of view is based upon the observation that FPGAs share some similarities
with both processors and ASICs. FPGAs are seen as similar to processors
because they are customized in the field by the end-user by downloading
configuration data into the device. They can also be seen as similar
to ASICs because they can implement high-performance, application-specific
circuits. It is hoped that if configurable computing can be shown
to be similar to conventional processors, it will be possible
to borrow microprocessor architecture and compilation techniques
for use in the configurable-computing community.
However, configurable computing, as defined by current
FPGA technology, does not fill the void between ASICs and processors.
FPGAs hold much more in common with ASICs than they do processors.
Indeed, if the spectrum of computing approaches were to be viewed
as a family, ASICs and configurable computing would be siblings
and processors would be distant relatives, at best. The distant
relationship between FPGAs and processors can be seen best by
studying the organization of successful FPGA applications,
i.e., those applications that achieve at least order-of-magnitude
performance gains over other processor-based approaches. A quick
review of these applications shows that they are highly concurrent,
deeply pipelined and achieve performance gains primarily by exploiting
massive amounts of data-level parallelism -- typically 100-1000
times that of a general-purpose microprocessor. Contrast this
with typical microprocessor applications that are described using
sequential languages, implemented as sequential instructions,
and executed on machines optimized for sequential execution.
The relationships between configurable computing,
ASICs, and microprocessors has several important implications.
First, sequential programming languages and related compilation
approaches are not likely to be a good match for highly parallel
configurable-computing applications. While it may be possible
to achieve moderate speedup, significant speedup will only be
achieved by directly exploiting massive amounts of parallelism.
This is currently done using low-level circuit design tools. Second,
the architectural organization (both at the device and system
level) will be much more distributed than is commonly found in
existing computer systems. For example, whereas typical computing
systems consist of large global memories, configurable-computing
platforms will be much better served by many, smaller distributed
memories. Finally, because of the fundamental mismatch between
the datapaths in processors and configurable-computing systems,
hybrid systems of microprocessors and FPGAs are best coupled in
flexibly so that the best features of each device can be fully
Metrics determine what kind of conclusions may be
drawn from benchmark results, and also affect how benchmarks must
be performed. The metrics in use in scientific computing benchmarks
address mainly four questions:
When evaluating architectural and packaging options
for an architecture, one commonly encounters the problem of meeting
performance requirements within the constraints of weight, volume
and power envelope as well as the amount of computation performance
that can be realized with a given physical envelope. This assessment
process can be guided by a metric that we have found to be relatively
consistent in past applications. It incorporates throughput in
million operations per second (MOPS), weight (and implicitly volume)
in kilograms, and power in watts. The MOPS/(kg.watt) ratio has
been used to evaluate technology and packaging tradeoffs.
The claim is that given a particular technology (pre-VHSIC,
VHSIC phase 1 and 2) and a particular packaging approach (representing
various die size per real estate area ratios), the selected combination
will produce a system where the MOPS/(kg.watt) is known to be
within a certain order of magnitude. A commercial supercomputer,
such as the Intel Paragon (Gamma), assuming 7.7 GFLOPS, 3000 lbs
and 116 kW of power, would represent a 0.0005 MOPS/(kg.watt).
A Honeywell militarized Touchstone (Sigma) avionics supercomputer,
assuming 7.7 GFLOPS, 82 lbs and 2.9 kW power, would represent
a 0.71 MOPS/(kg.watt). For example, future avionics systems need
to have a MOPS/(kg.watt) on the order of several hundred (209)
for a 1.8 GFLOPS/20 GOPS Touchstone enhanced radar preprocessor
realized on one double sided (2 lbs), liquid cooled (200 W) SEM-E
form factor board (1 GFLOPS ~ 10 GOPS).
In our experience, FPGA-based systems for any function
perform at an order of magnitude better MOPS/(kg.watt) than a
similar DSP-based implementation but still at an order of magnitude
less than a full ASIC implementation. Our evaluation will confirm
that observation, but in addition, will provide insight into the
mechanism of reconfiguration and its related timing expense.
In addition to the above metric, others have been defined to measure the effectiveness especially of reconfiguration aspect of configurable computing devices. A direct function-for-function evaluation, especially relative to ASICs, is not the proper way to evaluate configurable computing. Of interest are some of the metrics as proposed by DeHon at MIT as part of their reinventing computing program.