Essential papers - Reconfigurable Computing with KressArray
(1) Xputer related to Hardware/Software Co-Design | (2) History of Xputers
Paper Download Page

References of (1) and (2)

This paper presents a reconfigurable machine for applications in image or video compression. The machine can be used stand alone or as a universal accelerator co-processor for desktop computers for image processing. It is well suited for image compression algorithms such as JPEG for still pictures or for encoding MPEG movies. It provides a much cheaper and more flexible hardware platform than special image compression ASICs and it can substantially accelerate desktop computing.    ---- paper 062 The paper presents a new architectural class of high performance data-parallel machines, called Xputer. Xputer combine structural programming with traditional von Neumann control flow programming. From this combination a new programming paradigm arises which is not familiar to the usual software developer. To counteract this lack a program partitioning, restructuring, and mapping method for Xputers has been developed for the input language C. Sources are restructured and partitioned into an Xputer-suitable execution sequence providing parallelism at expression and at statement level. Data is mapped in a regular form onto the Xputer memory space to be accessible by the Xputers data sequencer hardware which provides a generic set of fast address sequences. The data operations within each part of the derived execution sequence are coded as a structural description for further synthesis towards the reconfigurable ALU which is based on field-programmable logic. Additionally, assembly code is produced in order to control program execution through the data sequencer hardware. The entire method performing the paradigm shift works without further user interaction and all steps are driven by parameters describing the actual target hardware configuration.    ----  paper 058 Xputers are a new architectural class of data-parallel machines, well suited for algorithms with regular or semi-regular data dependencies. They combine structural programming with traditional von Neumann control flow programming. From this combination a new programming paradigm arises which is not familiar to the usual software developer. Obviously this fact diminishes acceptance. To counteract this lack a program restructuring and mapping method for Xputers has been developed, and is presented in this thesis. To accommodate the majority of programmers C has been chosen as input language. Sources are restructured using techniques known from supercompilers followed by partitioning the program source into an Xputer-suitable execution sequence providing parallelism at expression and at statement level. As in all other phases these steps are driven by parameters describing the actual target hardware configuration making the method more flexible to be adjustable to the whole family of Xputers. Data is mapped in a regular fashion onto the Xputer memory space to be accessible by the Xputers data sequencer hardware providing a generic set of fast address sequences. The data operations within each part of the derived execution sequence are coded as a structural description for further synthesis towards the reconfigurable ALU. Additionally assembly code is produced in order to control program execution through the data sequencer hardware. The entire method performing the paradigm shift works without further user interaction. ---------- This paper introduces a new high level programming language for a novel class of computational devices namely data-procedural machines. These machines are by up to several orders of magnitude more efficient than the von Neumann paradigm of computers and are as flexible and as universal as computers. Their efficiency and flexibility is achieved by using field-programmable logic as the essential technology platform. The paper briefly summarizes and illustrates the essential new features of this language by means of two example programs.     ----  paper 056 The paper gives some highlights on a new R&D area called Hardware/Software Co-Design. It tries to give an answer to several questions. What are the goals and unsolved problems? What are the hardware platforms? What are the relations between field-programmable hardware and this new area?   ----  paper 053 A reconfigurable data-driven datapath architecture for ALUs is presented which may be used for custom computing machines (CCMs), Xputers (a class of CCMs) and other adaptable computer systems as well as for rapid prototyping of high speed datapaths. Fine grained parallelism is achieved by using simple reconfigurable processing elements which are called datapath units (DPUs). The word-oriented datapath simplifies the mapping of applications onto the architecture. Pipelining is supported by the architecture. The programming environment allows automatic mapping of the operators from high level descriptions. Two implementations, one by FPGAs and one with standard cells are shown.     ----  paper 052 This paper describes our experiences with the hardware description language Verilog during the development of the Xputer prototype. At first it introduces the novel non-von Neumann architecture of the Xputer, its need for efficient address generation and the basic structure of the Generic Address Generator. After a short introduction to Verilog, we discuss the problems with this hardware description language and show how to get around using some design restrictions. At the end a outlook on testing and simulation possibilities is given.     ----  paper 047 The term Xputer stands for a new computational paradigm opening up a wide design space for many Xputer architectures. One possible implementation example, the Map-oriented Machine (MoM-2) is used as an example to explain basic operational principles of Xputers. These are not based on the von Neumann principles which dominate in contemporary computer systems. A discussion of these is given showing their throughput problems. In addition, several speed-up techniques being used in modern von Neumann architectures and beyond these are shortly discussed. Many Xputer architectures are feasible and some architectural design alternatives are presented. The performance derives from the inherent fine grained parallelism and the auto data sequencing mechanism. The fine grained parallelism results from the reconfigurable ALU (r-ALU) where programmable compound operators are configured during compile-time. The auto data sequencing mechanism provides systematic and optimized methods for data accesses. In Xputers, the data sequencing plays a central role. On the one hand, the universal address generator enables the generation features of other systems. On the other hand, the predominant combinatorial programming requires a new kind of sequential mechanism which is realized by the address generator. Therefore, it must provide a rich and balanced repertoire of scan features. Instead of control-flow, data-flow is the primary activator in Xputers. Application specific compound operators are configured within the flexible wide-bandwidth r-ALU at very low level yielding parallelism at very fine granularity, or ultra micro parallelism. It is based on a new generation of programmable hardware: the field programmable media (FPM). The r-ALU replaced the hardwired ALU used in most computer systems. A new kind of register file organization, the data scan cache, serves as a window to the data memory. It supports fine granularity data scheduling strategies which minimize processor/memory traffic. Due to the compiler-friendly hardware features of Xputers new effective optimization methods in compilers are possible, which cannot be applied to other computer systems. ------------------ This paper introduces a new design methodology for rapid implementation of cheap high performance ASICs. The method described here derives from high level algorithm specifications or from high level source programs not only the target hardware, but - in contrast to silicon compilers - at the same time also the machine code to run it. The new method is based on a novel sequential machine paradigm where execution is used (being by orders of magnitude more efficient) instead of simulation and where programmers may do the design job, rather than real hardware designers. The paper illustrates that for a very large class of commercially important algorithms (DSP, graphics, image processing and many others) this paradigm is by orders of magnitude more efficient than the von Neumann paradigm. Compared to von-Neumann-based implementations acceleration factors of up to more than 2000 have been obtained experimentally. The performance of ASICs obtained by this new methodology mostly is competitive to ASICs designs obtained on the much slower and much more expensive "traditional" way. As a by-product the new methodology also supports the automatic generation of custom computing machines as accelerators for co-processor use in work stations etc., such as e. g. to accelerate EDA tools. It is the goal of this paper to explain the highly efficient application of the xputer paradigm, rather than to introduce its hardware implementation. It is the goal of this paper to illustrate the innovative power of this paradigm, and its potential for a major step of progress toward systematically deriving ASIC designs from algorithm specifications.     ----  paper 040 An application development environment for xputers is introduced in this thesis. Xputers are based on a new machine paradigm. Their performance derives from the inherent ultra micro grained parallelism and the auto data sequencing mechanism. The ultra micro parallelism results from the reconfigurable ALU (r-ALU) where programmable compound operators are configured during compile-time. The auto data sequencing mechanism provides systematic and optimized methods for data accesses. The Xputer's data sequencing mechanism is the primary activator for all actions of the system. It reduces control flow to sparse residual control. The proposed application development environment comprises a special xputer language and a compilation method for ordinary programs. The compilation technique is completely based on and driven by data dependence analysis by adapting the theory of systolic array generation. But here the extracted parallelism is not laid down into a fixed hardware structure which cannot be changed any more. Since xputers offer a flexible reprogrammable hardware platform its compiler-defined structure can be changed arbitrarily. Thus for the first time the parallelization strategies of systolizing compilation can be freed from the restrictive control-driven von Neumann principles. Most of the von Neumann bottlenecks are avoided resulting in performance figures which can be compared even with ASIC solutions despite xputers are uni-processors.     --------------
  1. [11]  A. Ast, R. W. Hartenstein, A. G. Hirschbiel, M. Riedmüller, K. Schmidt, M. Weber: Using Xputers as Inexpensive Universal Accelerators in Digital Signal Processing; Bilkent'90 Int. Conference on New Trends in Communication, Control and Signal Processing; Ankara, Turkey, 1990; also in: \x11Prepr. Int'l Workshop on Algorithms and Parallel VLSI Architectures, Pont-à-Mousson, France, 1990.
The paper introduces to xputer use to accelerate digital signal processing algorithms and other parallel algorithms within a wide variety application areas. (Xputers are a novel class of high performance processors.) The programming paradigm, which stems from the deterministically data-driven xputer machine paradigm, is illustrated by introducing a few xputer application examples in digital signal processing. The paper first briefly introduces the novel high performance machine principles. Finally the paper discusses, how the novel method may be also used for fast and cheap design of ASICs and highly flexible accelerators, and gives some throughput and hardware cost figures having been obtained experimentally.    ---- paper 025 The paper introduces the principles of xputers - in contrast to the principles of von Neumann type computers. The paper characterizes a class of algorithms which run by orders of magnitude faster on xputers than on computers and explains the novel execution mechanisms of xputers as well as novel compilation techniques to generate high performance xputer machine code. The paper proves, that xputers are as universal as computers. Based on a capacity analysis of communication mechanisms within the hardware the paper also shows the competitiveness of xputers against MIMD concurrent computers, VLIW computers, and data flow machines, and illustrates, that the design space of xputer architectures opens up a promising new area of research and development in processor architecture.   ---- paper 016 In this paper we describe a new architecture, called 'Map Oriented Machine' (MOM), which fills the gap between the totally flexible but slow von Neumann computer and the very fast but expensive and inflexible fully parallelized solution directly implemented on customized silicon. Some applications of MOM, presented here, show that algorithms which have a map-oriented organisation, such as image processing, can be implemented in a very efficient way on MOM. The basic idea of speeding up the algorithms is to parallelize the program access by combinational hardware, whose development is supported by some CAD tools.     ----  paper 079 The DPLA is an SRAM-based pattern matching circuit providing 24 patterns with 12 bit each. The array may be expanded across chip boundaries by linking several chip together. The circuit was fabricated at IMS in Duisburg with a 3 micron NMOS process. The circuit area is 3.6 by 3.5 square mm for 7000 transistors in a 40 pin DIL package. ------------ The paper introduces the MOM (Map-oriented Machine), a reconfigurable procedurally data-driven machine architecture. A wide variety of problem-oriented data paths (POLUs) for pattern matching applications can be generated with a special CAD tool for the MOM - without any need to change the rest of the machine hardware. The MoM has a two-dimensional memory organization using a memory buffer, called "scan cache", which operates like a peep hole to view the memory map. A data sequencer can be programmed to move this scan cache window along a variety of generic scan paths through the memory map. The paper also illustrates MOM use for acceleration in image processing applications and integrated circuit layout rule checking.   ----  paper 011 The paper describes an innovative computation resource concept which for a class of data processing problems is an alternative to the von Neumann machine. The 'processor', called 'Map Oriented Machine' (MOM), used for this concept is faster than a von Neumann-type computer, however, it is substantially less expensive than a fully parallel hardwired implementation using full custom or semi custom circuits. Instead of a program store with a program sequencer a personalized hardware is used, and, to 'program' this machine CAD tools are used instead of conventional compilers. The MOM concept is a compromise between the purely sequential von Neumann concept (sequential control part and sequential data part) and fully parallel solutions (parallelized control part and parallelized data manipulation side) insofar, as the control part has been parallelized, the data manipulation side, however, still uses a universal sequential access organisation.   ----  paper 010 The paper describes a system for pixel-oriented layout analysis. It may be used as a design rule checker, or to support other types of layout analysis, such as circuit extraction, electrical rules checking, and others. Two versions of the system are described: a special hardware version, and a software version, which may be also used as a CAD tool to personalize the hardware version of it.     ------------------- A new architectural class of high performance data-parallel machines, called Xputers, is presented which combine structural programming with traditional von Neumann control flow (procedural) programming. From this combination a new programming paradigm arises which is not familiar to the usual software developer. To counteract this deficiency an automatic parallelization and compilation method for Xputers has been developed for the input language C. Sources are restructured and partitioned into an Xputer-suitable execution sequence providing parallelism at expression and at statement level. Data is mapped in a regular form onto the Xputer memory space to be accessible by the Xputers data sequencer hardware which provides a generic set of fast address sequences. The data operations within each part of the derived execution sequence are coded as a structural description for further synthesis towards the reconfigurable ALU which is based on field-programmable logic. Additionally, assembly code is produced in order to control program execution through the data sequencer hardware. The entire method performing the paradigm shift works without further user interaction and all steps are driven by parameters describing the actual target hardware configuration.     ----  paper 061 A new architectural class of high performance data-parallel machines, called Xputers, is presented which combines structural programming with traditional von Neumann control flow programming. From this combination a new programming paradigm arises which is not familiar to the usual software developer. To counteract this drawback a program partitioning, restructuring, and mapping method for Xputers has been developed for the input language C. Sources are restructured and partitioned into an Xputer-suitable execution sequence providing parallelism at expression and at statement level. Data is mapped in a regular form onto the Xputer memory space to be accessible by the Xputer's data sequencer hardware which provides a generic set of fast address sequences. The data operations within each part of the derived execution sequence are coded as a structural description for further synthesis towards the reconfigurable ALU which is based on field-programmable logic. Additionally, assembly code is produced in order to control program execution through the data sequencer hardware. The entire method performing the paradigm shift works without further user interaction and all steps are driven by parameters describing the actual target hardware configuration.   ----  paper 058 An FPGA architecture (reconfigurable datapath architecture, rDPA) for word-oriented datapaths is presented, which has been developed to support a variety of Xputer architectures. In contrast to von Neumann machines an Xputer architecture strongly supports the concept of the "soft ALU" (rALU). Fine grained parallelism is achieved by using simple reconfigurable processing elements which are called datapath units (DPUs). The word-oriented datapath simplifies the mapping of applications onto the architecture. Pipelining is supported by the architecture. It is extendable to almost arbitrarily large arrays and is dynamically reconfigurable in-circuit. The programming environment allows automatic mapping of the operators from high level descriptions. The corresponding scheduling techniques for I/O operations are explained. The rDPA can be used as a reconfigurable ALU for bus-oriented host based systems as well as for rapid prototyping of high speed datapaths.     ----  paper 054 A reconfigurable wavefront array rDPA (reconfigurable datapath architecture) for evaluation of any arithmetic and logic expression is presented. Introducing a global I/O bus to the array simplifies the use as a coprocessor in a single bus oriented processor system. Fine grained parallelism is achieved using simple reconfigurable processing elements which are called datapath units (DPUs). The word-oriented datapath simplifies the mapping of applications onto the architecture. Pipelining is supported by the architecture. It is extendible to arbitrarily large arrays and dynamically in-circuit reconfigurable. The programming environment allows automatic mapping of the operators from high level descriptions. The corresponding scheduling techniques for I/O operations are explained. The rDPA can be used as reconfigurable ALU for bus oriented host based systems as well as for rapid prototyping of high speed datapaths.     ----  paper 055 This paper illustrates a novel class of computational devices called Xputers, which are by up to several orders of magnitude more efficient than the von Neumann paradigm of computers. The paradigm is partially based on using field-programmable logic. The paper shows how the new paradigm is partly derived from accelerating features of image processors and digital signal processors, and it illustrates Xputer execution mechanisms and associated programming techniques by means of simple algorithm examples.   ------------------ New high performance computational paradigms have been introduced, such as Xputers. Xputers have a reconfigurable ALU using FPGA-like technology. This results in an efficient novel machine paradigm, competitive to many ASIC solutions. It permits systematic derivation of machine code from high level algorithm specs or programs. After testing and debugging real gate array specs may be derived by retargeting. This is a shortcut on the way from algorithm to silicon: less effort and shorter time to market. Compared to conventional ASIC design this means: a) real execution instead of simulation, b) higher source language level and thus more concise specification.   --------- The presentation shows a new machine paradigm based on field-programmable logic for computer aided SW/HW engineering (CASHE). For accelerating bottlenecks in algorithms, a new procedural machine paradigm is needed, the Xputer paradigm. This paradigm supports the use of a Œsoft ALU¹ (reconfigurable ALU). It has a data-procedural execution mechanism and it is deterministic in contrast to dataflow machines. High performance improvements have been achieved for a class of regular, scientific computations. The Xputer serves as a universal accelerator co-processor platform or as a stand alone platform for embedded systems. It offers new ways to quick ASIC implementation and new ways to supercomputing.  ------------ This paper illustrates an innovative compilation technique which is important for a novel class of computational devices called Xputers, which are by up to several orders of magnitude more efficient than the von Neumann paradigm of computers. Xputers are as flexible and as universal as computers. The flexibility of Xputers is achieved by using field-programmable logic (interconnect-reprogrammable media) as the essential technology platform (whereas the universality of computers stems from using the RAM). The paper first briefly illustrates the Xputer paradigm as a prerequisite needed to understand the fundamental issues of this new compilation technology. ---- paper 048 New high performance computational paradigms have been introduced, such as Xputers. Xputers have a reconfigurable ALU using FPGA-like technology. This results in an efficient novel machine paradigm, competitive to many ASIC solutions. It permits systematic derivation of machine code from high level algorithm specs or programs. After testing and debugging real gate array specs may be derived by retargeting. This is a shortcut on the way from algorithm to silicon: less effort and shorter time to market. Compared to conventional ASIC design this means: a) real execution instead of simulation, b) higher source language level and thus more concise specification. ---- paper 044 This paper illustrates an innovative compilation technique which is important for a novel class of computational devices called Xputers, which are by up to several orders of magnitude more efficient than the von Neumann paradigm of computers. Xputers are as flexible and as universal as computers. The flexibility of Xputers is achieved by using field-programmable logic (interconnect-reprogrammable media) as the essential technology platform (whereas the universality of computers stems from using the RAM). The paper first briefly illustrates the Xputer paradigm as a prerequisite needed to understand the fundamental issues of this new compilation technology. ---- paper 043 This paper introduces a novel (non-von Neumann) paradigm of parallel computation supporting a much more efficient implementation of parallel algorithms. Acceleration factors of up to more than 2000 have been obtained experimentally on the MoM architecture for a number of important applications - although using a hardware being more simple than that of a single RISC microprocessor. The machine organization and the most important hardware features of xputers are briefly introduced. The programming paradigm and its flexibility is illustrated by simple DSP and image processing examples. ---- paper 042 This paper introduces an innovative compilation technique which is essential to a novel class of computational devices called Xputers, being by up to several orders of magnitude more efficient than von Neumann paradigm of computers. Xputers areas flexible and as universal as computers. But the central technology platform of flexibility is field-programmable logic (we would prefer the term interconnect-reprogrammable media), rather than the RAM which gives the flexibility of computers. The paper first briefly summarizes the Xputer paradigm as a prerequisite needed to understand the fundamental issues of this new compilation technology.  ---- paper 039 Computers (based on von Neumann principles) are extremely inefficient. That¹s why this paper introduces a novel computational paradigm based on new hardware machine principles. Such machines, called "xputers" avoid most of the bottlenecks known from (von Neumann) computers, so that a hardware efficiency is obtained which is higher by several orders of magnitude. By means of a few algorithm examples the new paradigm will be introduced as a new programming paradigm, which is data-procedural (which is more direct than the control-procedural von Neumann paradigm). Finally the paper gives a survey on the novel application development environments needed for xputers and their advantages over such tools for computers. Such application support for xputers includes two alternative source levels: high level programs, or very high level algorithm specifications.  ---- paper 037 This paper introduces a novel (non-von Neumann) programming paradigm of parallel computation featuring a much more efficient implementation of parallel algorithms, as well as a novel (hardware) machine paradigm efficiently supporting such implementations. Acceleration factors of up to more than 2000 have been obtained experimentally on an example architecture for a number of important applications - although using a hardware being more simple than that of a single RISC microprocessor. Due to its auto-sequencing data memory the machine principles are partly related to the organization of associative memories or systems. The machine organization and its most important hardware features are briefly introduced. The programming paradigm and its flexibility based on field-programmable logic is illustrated by a few application examples.  ----  paper 033 The paper first introduces the novel machine organization of xputers - in contrast to von Neumann type computer principles. Then the paper introduces the novel xputer paradigm as a model to implement parallel algorithms (important e. g. in image processing, digital signal processing, computer graphics, VLSI layout verification), to run by orders of magnitude faster on xputers than on computers. The paper illustrates this model and the novel execution mechanisms of xputers by a few simple application examples. Xputer principles are sufficiently simple to open up a large new R&D area to define a wide variety of innovative architectures. The paper gives some throughput figures and hardware cost figures having been obtained experimentally from application examples running an xputer architecture and from code having been generated by a compiler, both having been implemented. Finally it discusses technology issues and the use of the xputer paradigm as a novel method for very fast and cheap design of ASICs.  ----  paper 033 This paper introduces a novel (non-von Neumann) paradigm of parallel computation supporting a much more efficient implementation of parallel algorithms. Acceleration factors of up to more than 2000 have been obtained experimentally on the MoM architecture for a number of important applications. - although using a processor hardware being more simple than that of a single RISC microprocessor. The most important hardware features of Xputer will be briefly introduced. By simple DSP and image processing algorithm examples the programming paradigm and its flexibility will be illustrated. ---- paper 032 This paper first introduces a novel machine paradigm as a model for very high performance implementation of parallel algorithms in important application areas such as image processing, digital signal processing, computer graphics, VLSI layout verification, routing and others. The paper illustrates this model by means of a simple application example. Then the paper introduces a novel method for fast and cheap design of ASICs and highly flexible accelerators, which is based on this paradigm. The paper gives some hardware throughput and hardware cost figures having been obtained experimentally.  ----  paper 030 The paper introduces a novel (non-von Neumann) paradigm of parallel computation supporting a much more efficient implementation of parallel algorithms. Acceleration factors of up to more than 2000 have been obtained experimentally on the MoM architecture for a number of important applications. - although using a processor hardware being more simple than that of a single RISC microprocessor. The most important hardware features of Xputer will be briefly introduced. By simple DSP and image preprocessing algorithm examples the paradigm and its flexibility will be illustrated.  ---- paper 029 This paper introduces a novel (non-von Neumann) paradigm of parallel computation supporting a much more efficient implementation of parallel algorithms. Acceleration factors of up to more than 2000 have been obtained experimentally on the MoM architecture for a number of important applications. - although using a processor hardware being more simple than that of a single RISC microprocessor. The most important hardware features of Xputer will be briefly introduced. By simple DSP and image processing algorithm examples the programming paradigm and its flexibility will be illustrated. ---- paper 028 The paper gives an introduction to using xputers (a novel class of high performance processors - based on one of the most important machine concepts since von Neumann) for acceleration of digital signal processing. Its novel programming paradigm of data sequencing is illustrated by a FFT digital signal processing example.  ----  paper 027 The paper introduces the principles of xputers - in contrast to the principles of von Neumann type computers. The paper characterizes a class of algorithms which run by orders of magnitude faster on xputers than on computers and explains the novel execution mechanisms of xputers as well as novel compilation techniques to generate high performance xputer machine code. The paper proves, that xputers are as universal as computers. Based on a capacity analysis of communication mechanisms within the hardware the paper also shows the competitiveness of xputers against MIMD concurrent computers, VLIW computers, and data flow machines, and illustrates, that the design space of xputer architectures opens up a promising new area of research and development in processor architecture.  ---- paper 023 A method SYS2 to MoM to map systolic systems onto the MoM (map-oriented machine) is introduced in this paper. This mapping method is needed to derive a methodology of MoM application development support from the theory of systolic array synthesis. The MoM is a flexible non-von-Neumann computer principle having been developed at Kaiserslautern. MoM "programming" uses combinational code (for path programming) instead of sequential code (for sequencing). That's why for a wide variety of computation problems the MoM provides substantial acceleration factors over von Neumann machines. The MoM can also be used as an inexpensive and highly flexible programmable pseudo-systolic processor for emulation of systolic arrays, such as e.g. in experimenting with alternative systolic architectures at very early phases of the systolic array design process.  ----  paper 019 This paper introduces a family of non-von-Neumann innovative computing devices, called Xputers. The map-oriented machine (MoM), an example Xputer architecture, is a flexible non-von-Neumann accelerator machine having been developed at Kaiserslautern University. This machine uses a two-dimensional map-oriented data memory. Over this memory a variable-sized window cache can be moved in arbitrary move schemes to analyse and change the data via the problem-oriented logic unit, which delivers powerful, programmable pattern matching mechanisms. The MoM can be used to speed up signal processing, image processing and VLSI layout processing and many other applications, but it may also serve as a systolic array simulator and evaluator. Moreover it can be also used as a low-cost, simple, flexible, and programmable array emulation computer. In contrast to a systolic array, where data streams are moving through an array of PEs, the MoM keeps data at fixed locations in its memory and moves its scan cache window in an application-specific manner across this memory space.  ----  paper 018 The map-oriented machine (MoM) is a flexible non-von-Neumann accelerator having been developed at Kaiserslautern University. This machine uses a 2-dimensional map-oriented data memory, over which a variable-sized window cache can be moved in arbitrary schemes to analyse and change data via the problem-oriented logic unit, which delivers powerful, programmable pattern matching mechanisms. The MoM can be used to speed up signal processing, image processing, VLSI layout processing and many other applications, and it may serve as a systolic array simulator and evaluator. Moreover it can be used as a low-cost, simple, flexible and programmable array emulation computer. In contrast to systolic arrays, where data streams are moving through an array of PEs, the MoM keeps data at fixed locations in its memory and moves its window cache in an application-specific manner across this memory space.  ----  paper 017 The von Neumann principle has 2 bottlenecks: program accessing and data accessing. An innovative non-von-Neumann principle, having been introduced at Kaiserslautern, eliminates one of them. Its new processor, the Map-oriented Machine (MoM), is compared with the von Neumann concept. The MoM is the key resource to a completely new philosophy of data processing which we call "map-oriented processing". It does not use sequential programs, since it has no program sequencer. The way how it is programmed we call Œcombinational programming¹. For surprisingly many applications it provides acceleration factors of up to several orders of magnitude, compared to von-Neumann-type processing. Existing computer application support tools (assemblers, compilers, operating systems, etc.) cannot be used for the MoM since they produce sequential code. That¹s why a new programming theory and new application support tools are introduced such, that the MoM is based on a marriage between standard IC use and ASIC techniques.  ----  paper 010 In this paper we describe an innovative computing architecture, called Map Oriented Machine (MOM). Concerning speed, cost and flexibility today there mainly exist two extreme solutions: the totally flexible, but slow von Neumann computer and the very fast, but expensive and inflexible fully parallelized solution directly implemented on customized silicon. With some applications running on the MOM we show that it not only fills this gap, but also is a very good instrument to implement algorithms which have a map-oriented organization such as image processing. The basic idea of speeding up the algorithms is to parallelize the program access by combinational hardware, whose development is supported by some CAD tools. ---- paper 012 A datapath synthesis system (DPSS) for the reconfigurable datapath architecture (rDPA) is presented. The DPSS allows automatic mapping of high level descriptions onto the rDPA without manual interaction. The required algorithms of this synthesis system are described in detail. Optimization techniques like loop folding or loop unrolling are sketched. The rDPA is scalable to arbitrarily large arrays and reconfigurable to be adaptable to the computational problem. Fine grained parallelism is achieved by using simple reconfigurable processing elements which are called datapath units (DPUs). The rDPA can be used as a reconfigurable ALU for bus oriented systems as well as for rapid prototyping of high speed datapaths. ----  paper 066  -  58 This paper illustrates a new high level programming language which is important for a novel class of computational devices called  Xputers, which are by up to several orders of magnitude more efficient than the von Neumann paradigm of computers. Xputers are as flexible and as universal as computers. The flexibility of Xputers is achieved by using field-programmable logic (interconnect-reprogrammable media) as the essential technology platform. The paper first briefly illustrates the Xputer paradigm as a prerequisite needed to understand the fundamental issues of this new language.  ----  paper 050 This paper presents a novel hardware/software Co-Design framework CoDe-X for automatic generation of Xputer based accelerators. CoDe-X accepts C-programs and carries out both, the host/accelerator partitioning for performance optimization, and (second level) the parameter-driven sequential/structural partitioning of the accelerator source code to optimize the utilization of its reconfigurable datapath resources.   ---  paper 069 General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP-space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across to pick the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. -------- The paper is a plaidoyer for a radical methodological change in R&D of dynamically reconfigurable circuits. The paper illustrates, that the current main stream approach based on placement and routing is not very likely to obtain the area-efficiency and throughput needed to cope with the emerging crisis cost of future silicon technology generations. The proposed changes include both:
architectural principles and fundamental issues in application development support environments. The paper illustrates the feasibility of general purpose programmable accelerators and their commercialization.
The paper highlights computer systems’ increasing dependency on add-on accelerators. It shows, why only by a new methodology reconfigurable hardware will overcome its role as a niche technology and become competitive to ASICs and other hardwired accelerators. It illustrates the possible coming crisis of ASIC design based on wasting chip area by placement and routing and discusses the vision of software-only implementation of accelerators. ----  paper 097 - 55 This paper introduces a powerful novel sequencer hardware for controlling computational machines and for structured DMA (direct memory access) applications. The paper introduces the principles and the design of a novel class of this sequencer hardware which supports two-dimensional memory address space or at least the two-dimensional visualization of the traditional one-dimensional address space. From these concepts it derives a classification scheme of computational sequencing patterns and storage schemes. ----- This paper discusses the memory interface of custom computing machines. We present a high speed parallel memory for the MoM-PDA machine, which is based on the Xputer paradigm. The memory employs DRAMs instead of themore expensive SRAMs. To enhance the memory bandwidth, we use a threefold approach: modern memory devices featuring burst mode, an efficient memory architecture with multiple parallel modules, and memory access optimization for single a pplications. To exploit the features of the memory architecture, we introduce a strategy to determine optimized storage schemes for a class of applications. ------ Proceedings: Jose Rolim (Ed.): Parallel and Distributed Processing, Lecture Notes in Computer Science 1388, Springer-Verlag, Germany, 1998
In the last years reconfigurable computing grew from a niche application to an important R&D scene. But also today most architectures lack essential features for the convenient use as a co-processing unit. E.g. embedded accelerator design with traditional FPGAs is very similar to sophisticated ASIC-design due to the bit-level granularity of FPGAs. In this paper important topics for reconfigurable platforms in multitasking systems are discussed. Run-time programmability as well as rapid application implementation using high-level languages are illustrated. Besides the underlying concepts the hardware implementation of a field-programmable ALU array (FPAA), the KrAA-III, is explained.  ---  paper 099 The paper introduces a novel co-compiler and its “vertical” parallelization method, including a general model for co-operating host/accelerator platforms and a new parallelizing compilation technique derived from it. Small examples are used for illustration. It explains the exploitation of different levels of parallelism to achieve optimized speed-ups and hardware resource utilization. Section II introduces novel vertical parallelization techniques involving parallelism exploitation at four different levels (task, loop, statement, and operation level) is explained, achieved by for configurable accelerators. Finally the results are illustrated by a simple application example. But first the paper summarizes the fundamentally new dynamically reconfigurable hardware platform underlying the co-compilation method.  ----- The paper is a plaidoyer for a radical methodological change in R&D of dynamically reconfigurable circuits. The paper illustrates, that the current main stream approach based on placement and routing is not very likely to obtain the area-efficiency and throughput needed to cope with the emerging crisis cost of future silicon technology generations. The proposed changes include both: architectural principles and fundamental issues in application development support environments. The paper illustrates the feasibility of general purpose programmable accelerators and their commercialization.
The paper highlights computer systems’ increasing dependency on add-on accelerators. It shows, why only by a new methodology reconfigurable hardware will overcome its role as a niche technology and become competitive to ASICs and other hardwired accelerators. It illustrates the possible coming crisis of ASIC design based on wasting chip area by placement and routing and discusses the vision of software-only implementation of accelerators. ----  paper 097 This paper introduces a powerful novel sequencer hardware for controlling computational machines and for structured DMA (direct memory access) applications. The paper introduces the principles and the design of a novel class of this sequencer hardware which supports two-dimensional memory address space or at least the two-dimensional visualization of the traditional one-dimensional address space. From these concepts it derives a classification scheme of computational sequencing patterns and storage schemes. ----- This paper introduces a powerful novel sequencer for controlling computational machines and for structured DMA (direct memory access) applications. It is mainly focused on applications using 2-dimensional memory organization, where most inherent speed-up is obtained thereof. A classification scheme of computational sequencing patterns and storage schemes is derived. In the context of application specific computing the paper illustrates its usefulness especially for data sequencing - recalling examples hereafter published earlier, as far as needed for completeness. The paper also discusses, how the new sequencer hardware provides substantial speed-up compared to traditional sequencing hardware use. ---- A datapath synthesis system (DPSS) for the reconfigurable datapath architecture (rDPA) is presented. The DPSS allows automatic mapping of high level descriptions onto the rDPA without manual interaction. The required algorithms of this synthesis system are described in detail. Optimization techniques like loop folding or loop unrolling are sketched. The rDPA is scalable to arbitrarily large arrays and reconfigurable to be adaptable to the computational problem. Fine grained parallelism is achieved by using simple reconfigurable processing elements which are called datapath units (DPUs). The rDPA can be used as a reconfigurable ALU for bus oriented systems as well as for rapid prototyping of high speed datapaths. ----  paper 066 This dissertation describes theory and implementation of a systolic array synthesis programm SYS3 (Systolic Synthesis System), which accepts a KARL description of an equation system to be implemented and generates a KARL descriptions of a set of alternative systolic array architectures. This has in common with compilers for xputers, that its front part operation is based on data dependency analysis, i. e. the source program is not interpreted as a sequence spec.

post scriptum (DRAFT)

  • R. Hartenstein (invited embedded tutorial): Coarse Grain Reconfigurable Architectures; 6th Asia and South Pacific Design Automation Conference 2001 (ASP-DAC 2001), January 30 - February 2, 2001, Pacifico Yokohama, Yokohama, Japan, ----  paper 110
  • R. Hartenstein, Th. Hoffmann, U. Nageldinger: Design-Space Exploration of Low Power Coarse Grained Reconfigurable Datapath Array Architectures; PATMOS 2000 International Workshop - Power and Timing Modeling, Optimization and Simulation, Göttingen, Germany - September 13-15, 2000 ----  paper 109
  • R. Hartenstein, M. Herz, Th. Hoffmann, U. Nageldinger: Generation of Design Suggestions for Coarse-Grain Reconfigurable Architectures; 10th International Workshop on Field Programmable Logic and Applications, FPL '2000, Villach, Austria, Aug.27-30, 2000. ---- paper 108
  • (weitere mile stone Papiere ?)
     
     



    Paper Download Page
    (1) Xputer related to Hardware/Software Co-Design | (2) History of Xputers


     

    © Copyright 1998, 2001, University of Kaiserslautern, Kaiserslautern, Germany Webmaster