The PISA machine. The history of the Xputer in fact starts with the so-called PISA machine (PIxel-oriented System for image Analysis) , . Although the term 'Xputer' had not been coined at that time, the PISA machine already obeyed some of the most important Xputer principles (procedurally data-driven machine paradigm , 'soft' ALU).
The multi university E.I.S. Project. Working on the PISA machine has been funded by the E.I.S. project, the Mead-&-Conway-type German multi university project including multipoject chip infrastructure. "E.I.S. stands for "Entwurf Integrierter Schaltungen". The E. I.S. project started in 1983 and has been co-founded by Dr. Klaus Woelcken (GMD Birlinghoven at that time) and Reiner Hartenstein (Kaiserslautern University). The first support in forming a pressure group soon also came from Prof. Seitzer (Univ. Erlangen) and from Prof. Waldschmidt (Univ. Frankfurt/Main).
Reconfigurable problem-oriented logic unit (POLU). A customized operator was configured in a problem-oriented logic unit (POLU) of the PISA machine. It was activated each time a new set of input data was available in the so-called 'window cache' , which was later called 'scan window'. Later on, it showed, that the Xputer is the data-procedural pendant to the image processing computers sometimes called 'cellular machines'. At the time of the PISA machine, the POLU was restricted to pattern matching applications with patterns stored in an EPROM.
The Data Sequencer. The address generating Move Control Unit (MCU) of the PISA machine supported only one form of video-scan and absolute addressing of single pixels. The software support aimed at grid-based design rule check with a compiler generating reference patterns and JEDEC code from design rule descriptions.
The MoM-2 (Map-oriented Machine 2). The next generation prototype was called MoM-2  (Map-oriented Machine 2), renaming the PISA machine as MoM-1. Initially it still aimed at pattern matching based applications like image pre-processing, design rule check, Lee routing, etc. But during the development process we found that the architectural principles supported even parallel arithmetic operations and a far wider range of applications .
The first Generic Address Generator (GAG). The hardware of the MoM-2 still provided only a single address generator to update the so-called 'scan cache', but the repertory of 'scan patterns' , as they were called from this time on, included variations of video-scans, shuffle-scans, linear scans and relative jumps of the scan cache. The address generator of the MoM-2 was called MCU (Move Control Unit). The scan cache had a variable size and shape bounded by a rectangle to adapt to the requirements of the application's data distribution.
The POLU : an early SRAM-based FPGA. The POLU supported pattern matching and hardware controlled modification of the input data, dependent of the matching pattern. A custom designed NMOS circuit  supported SRAM-based pattern matching which allowed the exchange of patterns during runtime. Although the term FPGA was coined only later on, this circuit can be counted as one of the early SRAM-based FPGAs, which turned out to be an important technology platform for Xputers . In the MoM-2, the reconfigurable ALU (rALU) had to be a combinational hardware  without any registers. The timing of the rALU directly influenced the clock speed of the address generator. To achieve von-Neumann like universality , a Task Sequencer was introduced to the architecture to combine the address generator's scan patterns to arbitrary address sequences.
Xputer: a novel fundamental machine paradigm. At that time (about 1987) we recognized that we had developed a new machine paradigm, which we called 'Xputer' ,  from now on. Reiner Hartenstein said: When in July/August 1989 sitting in my garden at Bruchsal we (Antonio Nuñez-Ordinez, Francis Jutand, Michel Dana) discussed first drafts of the PATMOS proposal to the Commission of the EU, we made jokes about the prefix "X" in "Xputer": X stands for "yet unknown: proposals welcome". We thought, that for the non-von-Neumann Xputer it should be something contrasting to the (von Neumann) "Com" in "Computer", and, it should sound crisp .
Reconfiguration CAD tool for the MoM-2. On the software side, a graphical CAD-Tool  supported an easy way of programming the reference and result patterns for the POLU, as well as arithmetic operations (although these were not included in the prototype hardware due to a lack of suitable FPGAs at that time). This CAD-Tool generated the structural code for the MoM-2 - the sequential code  was compiled from MoPL-1  (MoM Programming Language 1), the first data procedural language  for Xputers. The term MoPL was first introduced in .
Xputers: a Generalization of Systolic Arrays? Although the strong relationship between Xputers and the systolic array scene had been detected quite early , it was fully elaborated in . This dissertation presents a compilation technique for Xputers, which derives the combinational and the sequential code for the MoM-2 from a systolic array specification. With the advent of FPGAs that are retargetable to semi-custom gate arrays, this compilation technique provided the basis of a new ASIC design method stronger than silicon compilation , because it produced not only the hardware description but also the machine code to run an algorithm on that hardware.
MoM-2 prototype. A complete description of the MoM-2 prototype can be found in , which also shows the relationship to cellular machines and explains the possibility of further speed-up by interleaved memory techniques. These can be fully utilized, because the deterministic data-driven concept  allows an optimal data distribution.
The MoM-3. After MoM-2 the next Xputer architecture prototype built at Kaiserslautern University is the MoM-3 . It introduces multiple Scan Windows, whose registers are integrated into the rALU for optimized packaging. The shape of the Scan Windows can be completely arbitrary and their size is increased to 64 memory words of 32 bits at most. Each Scan Window is controlled by a so-called Generic Address Generator  (GAG), featuring an even richer set of scan patterns than in the MoM-2 version. Instead of an inflexible FSM-based Task Sequencer, the GAGs are integrated into a coordinating micro-programmable Data Sequencer, which also simplifies code generation and programming.
Standard rALU Interface. A standard interface between rALU and the Data Sequencer allows both to run at their full speed, so that virtually any FPGA board for custom computing machines can be adapted as a rALU for the MoM-3. The rALU no longer has to be a combinational net, which allows the use of pipelining to speed up large operators.
The rDPA (reconfigurable Datapath Architecture). Pipelining and 32-bit wide arithmetic is supported by our own custom FPGA circuit , called rDPA (reconfigurable Datapath Architecture) or KressArray-I. The main difference of the KressArray compared to a conventional FPGA is the coarse grained structure, featuring 32 bit wide datapaths and all operators of the C programming language. Furthermore, the KressArray features (partial) in-system reconfiguration at run-time, fully transparent expansion across chip-boundaries, fully parallel internal communications and a global bus for quick distribution of input (and output) data into the array.
MoM-3 Memory Interfaces. The MoM-3 system architecture allows both direct memory access to the host computer's main memory, and multiple parallel accessible local memory modules. Each memory module may make use of interleaving techniques in itself, if the memory access time doesn't match the speed of the computational devices. Massive efforts have been spent to make the programming environment of the MoM-3 more familiar to most programmers.
XC compiler for High-Level Programming. The XC compiler  allows high-level programming, which starts from a subset of C language , which only lacks support for system calls and recursion, since both require an underlying operating system. Methods from parallelizing compilers are extended to suit the non-von Neumann Xputer paradigm. They provide a direct way from C source code to binaries for the Data Sequencer and the rALU, as well as a data distribution for optimized memory access.
ALE-X - a Language for Programming the rALU. To allow an exchange of the rALU hardware without having to redesign the complete software environment, the mapping process is driven by parameters, and a unified programming language for the rALU has been designed , which serves as an interface to the software environment. This language is called ALE-X (Arithmetic, Logic Expressions for Xputers) because it describes the operations to be performed in the rALU as arithmetic and logical expressions on the scan windows' contents. With an exchange of the rALU hardware, only the ALE-X compiler module for the new rALU has to be developed to make full use of the power of the hardware and software environment of the MoM-3.
KressArray-I. So far the Xputer as a generalization of the systolic array has just been a relatively vague idea. Michael Weber used a modified version of parts of a the systolic array synthesis system by Karin Lemmert  as a backend of his experimental "Xpiler" (Compiler for Xputers) . But the KressArray  is really a direct generalization of the systolic arra. The layout concept remained the same (wiring by abutment, mesh-based: NEWS network), but for synthesis simulated annealing (by DPSS) is used instead of the traditional linear projection methods. This way a reconfigurability of PE (processing element) cells make sense .
The Datapath Synthesis System (DPSS). For the KressArray-I, the Datapath Synthesis System (DPSS)  has been implemented, which compiles application datapaths into structural code for the KressArray hardware. Due to the coarse grained design, the configuration code generation is based on a simple simulated annealing algorithm rather than a complex design process involving placement and routing like for conventional FPGAs.
The KressArray: dramatically more area-efficient. Furthermore, it has shown, that the KressArray is much more area-efficient than an FPGA. Due to the big overhead for reconfiguration, routing facilities, and due to their fine-grained architecture, conventional FPGAs use only about one percent of the chip area for the implementation of the desired function . Compared to FPGAs, the KressArray has an amount of logic integration, which is three orders of magnitude better. The essential problem for conventional FPGAs are very expensive long-range connections for mesh-based fine-grained architectures.
KressArray-III. Based on the KressArray-I, other prototypes have been developed, the current model being the KressArray-III .
Compilation from C sources: the CoDe-X Application Development Framework. On the software side, the XC compiler , which is capable of deriving scan-patterns from C-programs, has been incorporated in the application development framework CoDe-X , which allows to map a C-program (without the restrictions of the subset for the XC compiler) onto a system consisting of a host system with an Xputer as accelerator.
MoPL language for programming Scan Patterns (data address sequences). Besides the availability of standard C for the programming of an Xputer-based system, CoDe-X featured also the language MoPL-3 , which allows hand-honed programming of the Xputer for experienced users. MoPL-3 is the successor of MoPL-2 . For explanation: scan patterns in Xputer programming (by MoPL) is the counter part to control flow in (von Neumann) computer programming. Instead of the program counter the data address registers are the machine state registers. MoPL also supports 2-D and more-dimensional memory space. This is also the basis of excellent visualization of a very rich repertory of generic scan pattenrs provided by the GAG concept.
The MoM-PDA architecture prototype. Currently, a new Xputer prototype called MoM-PDA (Map oriented Machine with Parallel Data Access) is under development. The MoM-PDA features a new data-sequencer , and a new memory architecture, featuring parallel memory banks. For this memory, special DRAM devices are being used, which provide a burst mode for high bandwidth. Additionally, a software is being developed, which optimizes applications to use the features of the memory architecture .