[Next] [Contents] [Index]


Design and Implementation
of a Compiler
for the Data Procedual Language MoPL-3
as an Optional Language Extension of CoDe-X Specifications


Wolfgang Hennig

Universität Kaiserslautern

Januar 1997

vorgelegt von: Wolfgang Hennig
Parkstraße 9
66482 Zweibrücken

betreut von: Prof. Dr.-Ing. Reiner W. Hartenstein
Lehrstuhl für Rechnerorganisation und Techn. Informatik
Universität Kaiserslautern
Erwin-Schrödinger-Straße
D-67663 Kaiserslautern

Dipl.-Informatiker Jürgen Becker
Lehrstuhl für Rechnerorganisation und Techn. Informatik
Universität Kaiserslautern
Erwin-Schrödinger-Straße
D-67663 Kaiserslautern

ausgegeben von: Prof. Dr.-Ing. Reiner W. Hartenstein

ausgegeben am: 12.07.1996

abgegeben am: 07.01.1997

1. Introduction 3

2. Specification of this Master-Thesis 6

3. Xputer - Target Hardware Underlying MoPL-3 8

3.1 Data Sequencer 9

3.1.1 Instruction Sequencer 10

3.1.2 Generic Address Generator 10

3.2 Reconfigurable ALU 12

4. Language Definition of MoPL-3 14

4.1 MoPL-3 Subroutine 14

4.2 Compound Window Declaration 14

4.3 rALU Configuration Grammar 16

4.4 Scan Pattern 20

4.5 Array Declaration 25

4.6 Activation Part 25

5. Examples for MoPL-3 Applications 29

5.1 Vector-Vector Scalar Multiplication 29

5.2 Sharpen Algorithm 32

5.3 Shuffle Exchange 34

5.4 Fast Fourier Transformation 38

5.5 JPEG Algorithm 40

5.6 The n-Body Problem 43

6. Code Generation 47

6.1 MoM Assembler 47

6.1.1 Configuration of the MAG 47

6.1.2 Configuration of the GAG 47

6.1.3 Instruction Sequencer 49

6.2 The ALE-X rALU Assembler 49

6.3 Hardware Configuration File 50

6.4 MoPL-3 Library Concept 50

6.5 Scan Pattern Library 52

7. Design of the Compiler 54

7.1 Overview 54

7.2 Basic Design 55

7.3 Implementation Issues 66

7.3.1 Files for the Compiler 66

7.3.2 Makefile for the MoPL-3 Compiler 69

7.3.3 Design of the MoPL-3 Library Preprocessor 70

8. Future Improvements 73

8.1 GAG Organisation 73

8.2 rALU Configuration 73

8.3 Scanpattern Declaration in MoPL-3 73

Appendix A: MoPL-3 Library 75

A.1 Vector Vector Multiplication 75

A.2 Vector Matrix Multiplication 76

A.3 Matrix Matrix Multiplication 77

A.4 Gauß Algorithm 78

A.5 Spline Interpolation 80

A.6 Sharpen Algorithm 82

A.7 Shuffle Exchange 83

A.8 Fast Fourier Transformation 84

A.9 n-Body Algorithm 85

Appendix B: Grammar summary of MoPL-3 87

9. Bibliography 94

10. Index 96

1. Introduction

From the very beginning of computer science the architecture of computers is based on the von-Neumann principle. Due to the increasing requirements in computer performance the processors have been improved over and over again. Internal cache has been integrated in the processors, pipelining has been implemented, dynamic and speculative execution has been integrated. Furthermore the clock frequencies double every two years. All that improvements on the von-Neumann like computers end in a performance increase of about 50-100% from one generation to another.

This performance increase itself is spectacular, but for a lot of algorithms more performance is needed. To achieve an acceleration of 2000 -5000 %, it is important to make elementary changes in the architecture of the computer.

The most important advantage of the von-Neumann architecture is the availability of general purpose computer languages. The whole software development is based upon that kind of languages. Even most of the software developers are used to program in that way. If a new computer architecture wants to become successful, the developers have to keep the software side in mind. Software developers have to get the chance to learn about the new architecture in an easy way. They do not want to throw their knowledge away and start from the very beginning. On the other side it is important to have a language available that is specialised on the new architecture and that takes use of most of the improvements of that new architecture.

A major point on the success of a new architecture is the need to work independently on that computer. It should be possible not to solve only special tasks, but to have standard programs available as well as an operating system.

This requirements in mind the developers did not invent a completely new computer system but extend an existing one. The resulting machine has been the so called Map oriented Machine (MoM). In the current prototype the MoM-3 is plugged in an ELTEC host, but the successor, the MoM-4, should be added to a SUN SPARC computer.

For the previous prototype MoM-2 the language MoPL-2 (Map oriented Programming Language 2) has been developed and implemented. This language reflected closely the MoM-2 hardware structure. By using MoPL-2 experienced users have achieved good acceleration factors with the help of the MoM-2 hardware. But the flexibility and potential number of applications were limited. Thus the more flexible language version MoPL-3 for the MoM-3 prototype has been developed and implemented by this thesis.

Figure 1.1 shows the basic components used by the Xputer. The data sequencer generates addresses that urge data memory to deliver the required data via the smart interface to the scan window in the reconfigurable ALU (rALU). The rALU does the actual calculations and the smart interface writes the results of that operations back into the data memory. The single components are described later on in chapter 3. The structure of the rALU is specified in figure 1.2. Figure 1.3 describes the communication between the host computer and the single Xputer modules.

2. Specification of this Master-Thesis

There are several abstraction levels within software development for Xputers. On the upper end of abstraction, a normal C program is compiled by a special two level partitioning and parallelism compiler that analyses the program and divides it into two parts. One part is the code that can not be executed efficiently on an Xputer like I/O or dynamic structures. The other part consists of code segments that run faster on the Xputer than on the host system. This includes algorithms like routing, MPEG-decoding, and others. This software part is compiled for the Xputer by the X-C compiler. On the lower end of abstraction, it is possible to program the Xputer in MoM-assembler. In between of that two abstraction levels is the MoPL-3 language that is introduced and implemented by this master thesis.

MoPL-3 has been designed for several reasons:

1. To achieve best performance of the MoM by experienced users.

2. To increase the flexibility and number of potential applications

compared to MoPL-2.

3. To access the hardware directly out of C programs.

The third point can be done by embedding the MoPL-3 code into the C-code within the compiler directive #ifdef _MOPL_ or by using special generic library functions. The MoPL code is then directly compiled for the MoM, whereas the rest of the C-program is rated weather to run on the host or on the Xputer.

With this opportunity in mind, a complete library of fast and efficient algorithms can be developed to be used in other programs.

The generic library has been implemented to offer software developers a set of algorithms that can be used without a special knowledge of the hardware. They can be adapted to their requirements rather easy and nevertheless achieve the best performance increase. Some algorithms have been implemented in this library yet and more will follow in future works.

The complete framework for CoDe-X is given in figure 2.1. The parts handled by this paper are highlighted.

This script includes an overview of the most important hardware specific topics of the Xputer in the third chapter. The fourth chapter specifies the language MoPL-3 with the help of syntax diagrams. The following chapter introduces the language by means of some examples. Chapter six specifies the background for the configuration and assembler file that have to be generated for the Xputer. References to the code generation of the compiler are also given. An overview of the compiler design is given in chapter seven where the object design is specified as well as implementation issues. The last chapter gives some ideas how MoPL-3 can be improved in the future, dependent on the data sequencing capabilities in future hardware prototypes.

The first appendix describes the library that is implemented by now, whereas in the second appendix the complete grammar of the language is included.

3. Xputer - Target Hardware Underlying MoPL-3

The MoM-3 has been designed in the AG Hartenstein at the University Kaiserslautern. The idea behind this concept is the Xputer paradigm. The Xputer consists first of all of the reconfigurable arithmetic logical unit (rALU). The rALU does the actual calculations in the Xputer. The next part is the data sequencing unit, that contains several generic address generators (GAG). This part is responsible for the stream of data that has to be transferred to the rALU. The data sequencing unit also controls the rALU and configures it.

With Xputers the highest speed-up is obtained for algorithms, which iterate the same set of operations over a large amount of data. Such operations are mapped onto the reconfigurable arithmetic logical unit. All input and output data to the complex rALU operations are buffered in a scan window (SW), which is a kind of smart register file. The sequence of the scan window in the memory area is determined by a data sequencer. All data in the scan windows can be accessed in parallel by the rALU.

Each of up to seven Xputer modules run independently from each other on the host, until a task is completed. Reconfigurations can be performed independently from other running tasks. Each module generates an interrupt to the host when the task is finished. The host may run concurrently to such tasks. The basic structure of an Xputer module consists of three major parts (figure 3.1):

1. two-dimensionally organized data memory

2. reconfigurable arithmetic logical unit (rALU)

3. reconfigurable data sequencer (DS)

Within iterations the access sequence to data, typically organized in arrays, often shows regularity which allows to describe address sequences generically from a few parameters. The sequence of memory locations reached by such generic address sequences we call scan patterns. By this basic sequencing mechanism Xputers are deterministically data-driven (non-von Neumann): i.e. one or multiple generic address generators (GAGs) are used instead of an instruction sequencer, generating addresses only by hardware without eating memory cycles. During a scan pattern operation execution is transport-triggered. The GAG operates in a two-stage pipeline. The first stage computes the position of the scan window. The second stage computes out of this position the relative addresses of the variables in the scan window.

The target machine for MoPL-3 is the map oriented machine (MoM-3). It is hosted in an ELTEC computer, that itself uses the VMEbus system. The MoM-3 has a VMEbus interface to handle the data stream from and to the host system. The MoM has an own bus structure, the MoMbus. It is optimized for the requirements of the MoM.

3.1 Data Sequencer

The data sequencer is responsible to create the scan pattern that moves the scan windows along their way. This results in consecutive addresses that are placed on the data memory to deliver the required data to the rALU. Besides this the data sequencer controls the rALU.

It includes seven GAGs and an instruction sequencer (IS).

3.1.1 Instruction Sequencer

This instruction sequencer is based upon an ordinary von Neumann architecture. There are several 24 bit registers available in the IS. Furthermore common assembler instructions as well as special commands to configure the GAGs and rALU are applicable. The IS has its own memory, that contains the program and data to be run.

The register set includes registers for the program counter (PC), two call registers (Call0 and Call1) that can be used for interrupt handling and for simple subroutine calls. For nested subroutine calls with a depth of more than two, a stack has to be used that is easy to implement with the help of assembler macros.

A register called IntPtr can be used to specify a subroutine that has to be called, if an external interrupt occurs. Furthermore a flag register exists that contains currently four different data bits. One bit signals what call register has to be used next (toggeling between Call0 and Call1). The zero flag as well as the negative flag is also implemented for conditional jumps. The fourth bit is always true to realise unconditional jumps. Finally ten general purpose 24 bit registers (R0 to R9) are available.

The operations on the IS include five types of instructions. Load/store instructions from memory to register (loadi, storei), register/register instructions (add, sub, move), control operations (bra, beq, bne, bmi, bpl, brc, bnc, nop, call, rts, rti, int), instructions with direct operands (loadc, addc, testc), and instructions for the GAG and rALU (actg, putg, getg, writer, readr, writeg, readg).

The instruction sequencer is implemented in a Xilinx XC4013 FPGA. The control logic is implemented as hard wired logic. For more informations about the implementation of the IS, like VHDL files and timing diagrams, see [12].

3.1.2 Generic Address Generator

The main task for the instruction sequencer is to configure the GAGs. A GAG contains special registers that have to be loaded to generate the desired address sequences. This GAG includes special steppers like address stepper and limit stepper. With this steppers the relative addresses are generated and a special segment checker examines whether the address is valid or out of range.

The GAG is divided into three parts: A control logic, the reference address generator and the memory address generator as shown in figure 3.3.

The reference address generator (RAG, see figure 3.4) produces the two components for the x and y direction each. Therefore two steppers are needed that can be synchronized in different ways.

An illustration of the x and y address stepper configuration is given in figure 3.5.

The address generation is done with the help of two loops. The inner loop counts the register Address from Base to Limit in steps of dA. The outer loop counts the register Base from B0 to F in step width of dB. The same time the register Limit runs from L0 to C in steps dL.

The address stepper can be configured in multiple ways. For example the exit condition can be set in different kinds. The synchronization for x and y loops can be configured (x inner loop and y outer loop, x outer loop and y inner loop or finally x and y in parallel).

See Configuration of the GAG on page 47 for an example how to get the parameters of the GAG from a scan pattern.

The memory address generator calculates the actual memory address as shown in figure 3.6. This includes the mapping from two dimensional memory to the linear address space as well as the data transfer from the MoM memory to the rALU. For this purpose the address generator program has a set of instructions, that allow these operations. The instructions include Read, Write, ReadA, WaitrALU, WaitAddr, Shift, Goto, IfEnd, PushA, PopA, ReadC and IfNoMove. The reference address is delivered by the reference address generator of figure 3.4.

For more informations about the GAG like specification and implementation see [9].

3.2 Reconfigurable ALU

The rALU does the actual calculations within the loop. This rALU consists of an array of operation elements, the so called DPUs (datapath units), and a control logic as shown in figure 3.7.

The current rALU implementation is called Kress Array. It is described comprehensively in [7].

Every DPU has two inputs, one from the west and one from the north. Furthermore it has two outputs (south and east). DPUs that are next to each other are joined together either via a north-south connection or west-east connection. The borders of the rDPA (reconfigurable datapath array) can be connected in a flexible way.

The DPUs operate data driven, so the operations are executed as soon as all data on the inputs of an operation node are available.

All DPUs are connected through a common bus. With the help of that bus, data can be transferred to the inputs of the DPU and can be fetched up from the outputs. This mechanism is used both for the transfer from the main memory to the DPUs as well as for transferring data from one DPU to another.

The operation unit of every DPU is loaded at runtime with a special microcode that is responsible for that operation. For this purpose a special library of microcoded operations is at hand. The available operations are stated in a special hardware configuration file that is described in Hardware Configuration File on page 50.

The number of rALU subnets depends upon the amount of cells that have to be allocated for the individual specifications. In every rALU subnet several scan caches can be stated. These scan caches are the interface between the data memory of the host and the rALU. The data is transferred from the host memory into the scan cache in a way the GAG describes. For more information about the rALU see [17, 7, 15].

4. Language Definition of MoPL-3

In the CoDeX framework each MoPL-3 program is placed in its own #ifdef _MOPL_ / #endif or #ifdef _MOPL / #elsif / #endif part in a host C-Program. The X-C compiler handles the appropriate MoPL-3 code and passes it on to the MoPL-3 compiler. Furthermore the X-C compiler is responsible for the correct embedment between the MoM and the host computer. That includes memory management, data transmission and the supervision over the program execution.

4.1 MoPL-3 Subroutine

A valid MoPL-3 subroutine is composed of a declaration part and a statement part.

The scan window declaration, the rALU configuration, the scan pattern declaration and the configuration of the memory access are to be specified in the declaration part.

4.2 Compound Window Declaration

The interface between rALU and memory is the scan window that has to be declared in the compound window declaration part. Thereby a rectangular area of the memory is made available to the rALU.

A scan window declaration consists of one or more window descriptions that are grouped up. This group is bound in the statement part to the scan pattern. The scan pattern itself moves the scan windows over the real memory to deliver the data to the rALU.

An individual scan window is specified through two areas. The first describes the shaping on the x direction of the memory, the second is responsible for the y direction. All memory references in MoPL-3 are two-dimensional, so the point after the keyword handle which specifies the reference point of the scan pattern. This can be important, if the scan window does not have a rectangular shape (it must have this kind of shape, but there can be don't cares within the rectangle). If no handle point is specified, the point [1,1] is set as handle position, what is in most cases.

Another important thing is the type of the scan window. This type corresponds to the types available in ALE-X [15], which depend on the hardware configuration.

If the same shape of scan window has to be used more than once, it is possible to specify more than one identifier for the scan window.

In the following example a scan window group is declared that consists of a two by two array and two single elements that measure one by one. The handle position is both times [1,1] and the elements are of the type float. The default value for the handle position is [1,1], so no specification is necessary in both declarations, but in the first one it is shown as an example.

/* scan window group ThreeW consisting of one 2x2 window and */
/* two 1x1 windows of floating numbers */
window			ThreeW is
			SW1		[1:2, 1:2] handle [1,1] of float;
			SW2, SW3		[1:1, 1:1] of float;

4.3 rALU Configuration Grammar

The rALU (reconfigurable arithmetic logical unit) is the part in the Xputer where actual calculations take place. The data needed for the calculation come from the scan windows that move over the memory in a way the scan pattern describes. The configuration of the rALU is specified once and during the whole computation the same operations are executed (of course with different data).

The referred scan window group name must be specified to achieve a relationship between the scan window group and the rALU registers. All accessed variables have to be defined in the scan window group.

The amount of available operations depends on the state of implementation of the MoM. Currently basic operations are implemented, but special libraries for trigonomical operations or others are possible. To get a flexible system, a special hardware description file is used to describe the available operations of the MoM. See Hardware Configuration File on page 50. These operations have to be available in microcoded form. The syntax of the operations are equal to that of ALE-X an assembler for the rALU. For further information about ALE-X see [15].

The declaration of the rALU subnets corresponds to most extents to ALE-X. It has to be considered that the operations are mapped into the rALU. The MoPL-3 compiler does not care whether the operations fit into the rALU nor does it optimize the rALU code. Nevertheless a good optimizer is implemented in ALE-X. That is the reason, why the MoPL-3 compiler has not the ability to check the size of the rALU memory, and maybe to split large computations into several parts that are calculated in different rALUs.

In the following the rALU subnet configuration for the FFT is described, at which only simple operations are executed on the scan window data.


/* Set Data in SW2 and SW3 to appropriate values for FFT */
/* SW1, SW2 and SW3 are grouped together in scan window group ThreeW */

rALUsubnet FFT of ThreeW is
			SW2 [1,1] = SW1[1,1] + SW1[2,1] * SW1[1,2];
			SW3 [1,1] = SW1[1,1] - SW1[2,1] * SW1[1,2];

The next example is a part of the run length coding of the JPEG algorithm. It is shown, how to make use of data dependent scans. The scan pattern ListScan is data dependent in this case. The rALU determines whether a step of the scan pattern is executed or not. If the set localBranchFlag statement is executed, the scan pattern moves one step further, otherwise it remains on its position. Please note that the set localBranchFlag has to be within an if - then - else statement, because a special feature of the MoM-4 is used here, the so called conditionFlag. This flag is set, when the expression in the if part becomes true.

If the set localBranchFlag is not placed within an if statement but within normal code, the command is always executed and the dependent scan pattern moves every step. Thus the set localBranchFlag would not make any sense in that case.


/* Set ListScan as a data dependent one */

scanPattern
		 ListScan is 63 steps [2,0] dependent on localBranchFlag;

/* IF (Src[1,1] != 0) THEN */
/*  store value from Src[1,1] in Dst [2,1] */
/*  make step to the right */
/* ELSE */
/*  increase number of zeros in Dst[1,1] */
/*  and remain on current position */

rALUsubnet 			Move of MoveW is
			if (Src[1,1] != 0) then begin
				Dst[2,1] = Src[1,1];
				set localBranchFlag;
			end else
				Dst[1,1] = Dst[1,1] + 1;

4.4 Scan Pattern

The scan pattern describes the way the scan window moves in the data memory during execution. These scan patterns can be of a simple (linear) kind or of a complex kind with procedural description. Thereby procedural scan pattern can base upon each other, to build complex scan pattern. For loops the while-construct is suit, which examines the X/Y-position of the scan pattern and leaves the loop, if necessary. Linear scans can be put together into an universal class with its own end-condition with the help of the until-statement.

The scan patterns used in the FFT-algorithm are executed in the statement part in a nested kind. Here SP1 and SP23 move in the inner scan pattern down one and two positions respectively. The outer scan pattern HLScan moves two positions to the right for three steps. The syntax of the execution and the example for the FFT is placed in Activation Part on page 25.


/* Specify scan pattern for scan windows SW1 (HLScan, SP1), */
/* SW2 and SW3 (HLScan, SP23) according to figure 4.1 */

scanPattern			SP1		is 7 steps [0,2];
			SP23		is 7 steps [0,1];
			HLScan		is 3 steps [2,0];

It is also possible to make transformations on scan pattern, but these transformations can be done only on simple scans that contain no while or until statements. If that should be possible, too, the scan pattern would have to be analysed and simulated at compile time, with the appropriate move statement in mind. If this work has been done, the transformed scan pattern had to be analysed and the GAG register set would have do be generated by means of that transformed scan patterns. This is hard to achieve, because there are more than four million different scan pattern possible, so that the analysis is too much effort.

These transformations are nevertheless useful, as the following example of a video scan shows. A picture of the scan pattern is given in figure 4.2.

/* Set scan patterns for a single line (LineScan) */
/* and for one position down (DownScan) */

scanPattern			LineScan 			is 7 steps [ 1, 0];
			DownScan			is 1 step  [ 0, 1];

/* Set ForwardScan to the composition of LineScan and DownScan */

			ForwardScan is begin
				LineScan;
				DownScan;
			end;
/* Set VideoScan to ForwardScan and mirrored ForwardScan that runs in a */
/* while loop to achieve the complete scan */

			VideoScan is begin
				while (@[,<=8]) begin
					ForwardScan;
					miry(ForwardScan);
				end;
			end;

The scan pattern declaration of the run length coding of the JPEG algorithm consists of different kinds of pattern. First of all, the four simple scan patterns (EastScan, WestScan, SouthWestScan and NorthEastScan) are declared. After that the UpLzigzagScan is described, which is composed of the simple scan patterns mentioned above. Furthermore both the while-loop and the until-statement are used. See figure 4.3 for the illustration of the complete scan pattern. In figure 5.7 and figure 5.8 the illustration of the complete scan pattern is given.

At the end of the example, a data dependent scan pattern is used for the ListScan. Here, the flags of the rALU are examined after the calculation, and dependent on what flags are set, the appropriate step of the scan pattern is executed. All flags not explicitly set in a step are reset.

Currently only one localBranchFlag is supported, but in further revisions of the MoM, it is possible to use more than one flag.

/* Set basic scan patterns EastScan, SouthScan, SouthWestScan and */
/* NorthEastScan to their maximum step length */

scanPattern			EastScan 			is 1 step [ 1, 0];
			SouthScan			is 1 step [ 0, 1];
			SouthWestScan 	is 7 steps [-1, 1];
			NorthEastScan 	is 7 steps [ 1,-1];

/* Set UpLzigzagScan to the upper half of the JPEG scan with the help of */
/* while and until statements */

			UpLzigzagScan is
			begin
				while (@[<7,]) begin
					EastScan;
					SouthWestScan until @[<=1,];
					SouthScan;
					NorthEastScan until @[,>=1];
				end;
			end;

/* Set LowRzigzagScan to the lower half of the JPEG scan the same way as */
/* UpLzigzagScan */

			LowRzigzagScan is
			begin
				while (@[<8,<8]) begin
					NorthEastScan until @[<=8,];
					EastScan;
					SouthWestScan until @[,<=8];
					SouthScan;
				end;
			end;

/* Set JPEGzigzagScan compound of UpLzigzagScan, EastScan, SouthWestScan */ 
/* and LowRzigzagScan */

			JPEGzigzagScan is begin
				UpLzigzagScan;
				EastScan;
				SouthWestScan;
				EastScan;
				LowRzigzagScan
			end;

/* Set ListScan as a data dependent one */

			ListScan is 63 steps [2,0] dependent on localBranchFlag;

4.5 Array Declaration

With the help of array declarations the size, structure and type of memory can be specified. This memory is the accessible to both, MoM and host-computer. On this memory area the scan window moves in the way the scan pattern specifies. The starting point of the scan window is specified with the move statement in the activation part.

This array declaration is the only reference to the data of the host computer. The run time environment takes care on the memory management.

The syntax of this declaration is quiet simple, thus only a small example of an array with the size of nine by sixteen containing float numbers is given here.


/* Set array CGFFT to the size of 9x16 of floating numbers */

array			CGFFT [1:9, 1:16] of float;

4.6 Activation Part

The declaration part follows after the activation part, which starts the actions on the Xputer.

The statement part consists of different blocks of statements, which are grouped within the with SW_Group_Name do .. end; statement. All operations within this block refer only to this scan window group.

The activation occurs within a block. The rALU has to be configured with the activate statement. Furthermore all scan windows of this group have to be placed with the move Scan_Window_Name to Array_Name Point statement to a position in the array.

The actual activation of the computation is done with a scan pattern call. The scan window in square brackets behind the scan pattern configures the windows that move the way the pattern describes. This scan patterns can also be nested. If this is true, the scan window moves first according to the inner scan pattern, and then the window is placed on its original position, before the first step of the outer loop is executed. The scan ends as soon as the last step of the inner loop is done in the last step of the outer loop.

Within a statement block more than one rALU subnet can be activated, passivated or removed. Furthermore several scan patterns can be applied on scan windows in sequential order.

Now an example of the FFT. The scan window group ThreeW is configured with the help of the with statement. The rALU is loaded using the activate command. The move statement places the scan windows on the starting positions in the data array. Finally the scan pattern are executed with the call to the scan pattern.

Please note, that it is not possible to do "no-scans". That means scan pattern that stay on a specific position for a fixed number of steps. This is a lack of the hardware and should be eliminated in further revisions.

On the other hand, it is possible to make no moves for a certain scan window that is running parallel to other windows. Due to the architecture of the current version of the GAG, all GAGs stop if one has finished its movement. This is why this implicit stay can be made, whereas a explicit stay is not possible.


/* Activation part for the FFT */

begin

/* Specify the scan window group ThreeW */

	with ThreeW do begin

/* Activate the rALU configuration FFT */

		activate FFT;

/* Move the scan windows to their starting positions on the array CGFFT */

		move SW1 to CGFFT [0,0];
		move SW2 to CGFFT [2,0];
		move SW3 to CGFFT [2,8];

/* Start the scan with SP1 and SP23 as the innermost scan for SW1 */
/* SW2 and SW3 that run in parallel */
/* HLScan is the outer scan that moves all three scan windows */

		HLScan (parbegin
			SP1[SW1];
			SP23[SW2];
			SP23[SW3];
		parend;) [SW1, SW2, SW3];

	end; /* of with ThreeW do begin */
end; /* of activation part */

At the end, for completeness reasons, the frequently used tokens like number and identifier are specified. Please note, that the length of identifiers is not limited by the MoPL-3 compiler. The size of integers and floats is limited by the implementation of the hardware.

5. Examples for MoPL-3 Applications

In this chapter an introduction to MoPL-3 is given by using some examples. First the principles are shown with the vector-vector scalar multiplication and the sharpen algorithm. Thereafter an example of the shuffle exchange, the fast fourier transformation and the run length coding of the JPEG algorithm are described. Finally the implementation of a version for the n-body problem is shown.

5.1 Vector-Vector Scalar Multiplication

The multiplication of two vectors is a simple example however that gives a good overview over the most important MoPL-3 language constructors. The algorithm written in C is as follows:

void skalarmult (int VectorP[4], int VectorQ[4], int *Result)
{
	int i;
	*Result = 0;
	for (i=0; i<=3; i++)
		*Result = *Result + VectorP[i] * VectorQ[i];
}
The operations take place on the following data: VectorP, VectorQ and Result. These variables are declared in MoPL-3 the same way. Nevertheless, variables are always two dimensional in MoPL-3. The type of the data is integer.


/* Set memory areas for VectorP, VectorQ and Result containing integers */

array			VectorP		[1:4, 1:1] of int;
			VectorQ		[1:1, 1:4] of int;
			Result		[1:1, 1:1] of int;
The scan pattern has to go down step by step (positive Y-direction) for VectorP and to the right (positive X-direction) for VectorQ.

Altogether three steps have to be done, since the position is determined in the beginning, which evaluates the rALU at this position for the first time, and then three more steps must be executed, until the final position is reached. There the rALU is executed for the last time - the fourth time. The number of steps is equal to the number arrows in figure 5.1, as well as the number of points is the same as the number of executions of the rALU.

The scan window group consists of three individual windows SW_P, SW_Q and SW_R, which have a width and length of one respectively. These variables have also to be declared as integers.

/* Set scan pattern for SW_P and SW_Q according to the algorithm */

scanPattern			RowScan		is 3 steps [1,0];
			ColScan		is 3 steps [0,1];

/* Set all scan windows to the size of 1x1 of integers */

window			ProductW is
			SW_E, SW_P, SW_Q [1:1, 1:1] of int;

The rALU has to execute the operation Result := Result + P[1,i] * Q[i,1]. This Operation refer to the scan windows in the group ProductW. Thus the rALU description is as follows:


/* rALU Product adds the content of SW_P multiplied by SW_Q to SW_R */

rALUsubnet 			Product of ProductW is
			SW_R [1,1] = SW_R [1,1] + SW_P [1,1] * SW_Q [1,1];
After the definition of the operations they have to be executed. First of all, with the help of the with-statement, the appropriate scan window group has to be selected. Furthermore the rALU has to be loaded (activate Product) and the scan windows have to be placed on their starting positions (move). Subsequently the execution of the calculations can started. To achieve the RowScan and the ColScan to be executed simultaneously, these scans are placed in between parbegin and parend. The position of SW_R does not change during the whole execution. This "stay" is no problem for the MoM-4, because the end of the calculation is signed by RowScan and ColScan.


/* Activation part for the vector-vector multiplication */

begin

/* Specify the scan window group ProductW */

	with ProductW do begin

/* Activate the rALU configuration Product */

		activate Product;

/* Move the scan windows to their starting positions on their arrays */

		move SW_P to VectorP [1,1];
		move SW_Q to VectorQ [1,1];
		move SW_R to Result  [1,1];

/* Start the scan with RowScan and ColScan for SW_P and SW_Q respectively */
/* that run parallel */

		parbegin 
			RowScan[SW_P];
			ColScan[SW_Q];
		parend;
	end; /* of with ProductW do begin */
end; /* of activation part */

The attentive reader will have struck, that this example can be specified in C more elegantly. It probably will run only a bit slower than on the MoM. In the practice more parallelism is achieved not only by doing one addition and one multiplication at the same time, but doing more than one. The rALU configuration could look like this:

/* rALU Product4 adds the summations of each of four elements of SW_P */
/* multiplied by the corresponding elements of SW_Q to SW_R */

rALUsubnet 			Product4 of ProductW is
		SW_R [1,1] = SW_R [1,1]					+ SW_P [1,1] * SW_Q [1,1]
							+ SW_P [2,1] * SW_Q [1,2]
							+ SW_P [3,1] * SW_Q [1,3]
							+ SW_P [4,1] * SW_Q [1,4];
With this parallelism a performance increase is achieved, that justifies the additional expenditure.

At the end again the complete program:

/* Example vector-vector scalar multiplication (4x1 by 1x4 to 1x1) */

/* Set memory areas for VectorP, VectorQ and Result containing integers */
array			VectorP		[1:4, 1:1] of int;
			VectorQ		[1:1, 1:4] of int;
			Result		[1:1, 1:1] of int;

/* Set scan pattern for SW_P and SW_Q according to the algorithm */
scanPattern			RowScan		is 3 steps [1,0];
			ColScan		is 3 steps [0,1];

/* Set all scan windows to the size of 1x1 of integers */
window			ProductW is
			SW_E, SW_P, SW_Q [1:1, 1:1] of int;

/* rALU Product adds the content of SW_P multiplied by SW_Q to SW_R */
rALUsubnet 			Product of ProductW is
			SW_R [1,1] = SW_R [1,1] + SW_P [1,1] * SW_Q [1,1];

/* Activation part for the vector-vector multiplication */
begin

/* Specify the scan window group ProductW */
	with ProductW do begin

/* Activate the rALU configuration Product */
		activate Product;

/* Move the scan windows to their starting positions on their arrays */
		move SW_P to VectorP [1,1];
		move SW_Q to VectorQ [1,1];
		move SW_R to Result  [1,1];

/* Start the scan with RowScan and ColScan for SW_P and SW_Q respectively */
/* that run parallel */
		parbegin 
			RowScan[SW_P];
			ColScan[SW_Q];
		parend;
	end; /* of with ProductW do begin */
end; /* of activation part */
5.2 Sharpen Algorithm

The sharpen algorithm is shown here as an example for image processing algorithms. Furthermore this algorithm should be implemented as an example for a library function.

This algorithm is used to emphasize edges on images for image recognition. The algorithm is rather simple, the task is to move a coefficient matrix over the image. The coefficient matrix is multiplied with the corresponding elements of the image and the sum of that multiplications is assigned to the centre element of the image.

First of all the interface of the function has to be declared. The size of the image should be variable, thus the header looks like this:

FUNCTION Sharpen (ImgWidth, ImgHeight)
The identifiers ImgWidth and ImgHeight can be used in the rest of the code. The variables are treated as constants, that's why they have no specific type. This parameters are handled and expanded by a preprocessor that is described in the MoPL-3 Library Concept on page 50.

The rest of the program is as usual. The size of the array for the image is ImgWidth by ImgHeight and the size of the coefficient matrix is 3 by 3. Both arrays are of the type integer.


/* Set memory areas for image and coefficient matrix containing integers */

array 			ImgData  [1:ImgWidth, 1:ImgHeight] of int;
			CoefData [1:3, 1:3] of int;
The scan pattern for the videoscan consists of RowScan that moves from the left to the right and ColScan that runs from top downwards. This scan patterns are nested within the scan pattern call. The scan pattern for this function as well as some snapshots for scan window positions are given in figure 5.2.

/* Set scan pattern for RowScan and ColScan according to the algorithm */
scanPattern 			RowScan is ImgWidth  - 2 steps [1,0];
			ColScan is ImgHeight - 2 steps [0,1];
The scan window consists of two windows with the size of 3 by 3, one for the coefficient matrix that does not move at all and one for the image window.


/* Set the two scan windows Img and Coef to the size of 3x3 of integers */
window 			SharpenW is
				Img, Coef [1:3, 1:3] of int;
The rALU subnet has to implement the calculations of the algorithm, so it is not difficult to get the following:


/* rALU Sharpen sums up the content of every element of Img multiplied by */
/* the corresponding element of Coef to the center element of Img */
rALUsubnet			Sharpen of SharpenW is
				Img [2,2] = 			Img [1,1] * Coef [1,1] +
							Img [2,1] * Coef [2,1] +
							Img [3,1] * Coef [3,1] +
							Img [1,2] * Coef [1,2] +
							Img [2,2] * Coef [2,2] +
							Img [3,2] * Coef [3,2] +
							Img [1,3] * Coef [1,3] +
							Img [2,3] * Coef [2,3] +
							Img [3,3] * Coef [3,3];
Please note, that this rather large rALU results a major performance increase. Nine multiplications and nine additions are executed within one step without reconfiguration.

It is also possible to code the coefficient matrix right into the rALU. This results in another performance increase, but with a lack in flexibility.

The rest of the program is straightforward: The appropriate scan window group has to be selected, the rALU must be loaded, and the scan windows have to be placed on their starting positions.

The algorithm is started with a nested scan, where RowScan is the innermost scan and ColScan is the outer one. The scan patterns can be executed also in reverse order. That means RowScan is the outer scan whereas ColScan is the inner one. It does not mind what kind of order is preferred.

The scan window Img has to move all the time, whereas the scan window Coef remains on its starting position.

/* Activation part for the sharpen algorithm */
begin
/* Specify the scan window group SharpenW */
	with SharpenW do begin

/* Activate the rALU configuration Sharpen */
		activate Sharpen;

/* Move the scan windows to their initial positions on Img- and CoefData */
		move Img  to ImgData  [1,1];
		move Coef to CoefData [1,1];

/* Start the nested scan for Img with RowScan and ColScan */
		RowScan(ColScan[Img];)[Img];
	end; /* of with ProductW do begin */
end; /* of activation part */
5.3 Shuffle Exchange

The algorithm of the shuffle exchange should show how the two dimensional memory allocation can be used to specify an algorithm even if it is not specified in two dimensional order. Figure 6.3 shows the way the scan pattern describes in one dimensional way. Figure 5.3 gives an illustration for both, the one and two dimensional scan patterns.

The scan pattern for the scan window SW1 is in both cases the same. It runs 8 steps downwards. The scan pattern for the scan window SW2 is different for the two versions.

First the inner loop has to perform two steps down by 3 positions. Then one step up has to be done by 5 positions. These two scan patterns have to be placed in a while loop that stops, if the position 9 is reached. The source code for the scan pattern declaration is placed below.

/* Set scan pattern for Down1Scan (move down by one position) for source, */
/* Down3Scan (move down by three positions) and UpScan (move fife */
/* positions back upwards) for the destination data */

scanPattern			Down1Scan      is 8 steps [ 0, 1];
			Down3Scan      is 2 steps [ 0, 3];
			UpScan         is 1 step  [ 0,-5];

/* Set ShuffleScan to iterate Down3Scan and UpScan until the final */ 
/* position at [1,9] is reached */

			ShuffleScan    is begin
				while @[,<=9] begin
					Down3Scan;
					UpScan;
				end;
			end;

Another possible version for this scan pattern can be done with the help of nested scan patterns. The scan pattern for the scan window SW1 is the same as above also the inner down scan. Only the outer down scan differs, this scan has to make 2 steps down by one position, because in nested scans the inner scan does not alter the position of the scan window when it finishes its work. Figure 5.3 shows in the first picture the difference between the two versions.

/* Set scan pattern for Down1Scan (move down by one position) for source, */
/* Down3Scan (move down by three positions) and OuterDownScan (move one */
/* position downwards) for the destination data */

scanPattern			Down1Scan      is 8 steps [ 0, 1];
			OuterDown1Scan is 2 steps [ 0, 1];
			Down3Scan      is 2 steps [ 0, 3];

The complete source for the first version is stated below:


/* Set the arrays for source (DataMap1) and destination (DataMap2) to 1x9 */

array			DataMap1 [1:1, 1:9] of int;
			DataMap2 [1:1, 1:9] of int;

/* Set scan pattern for Down1Scan (move down by one position) for source, */
/* Down3Scan (move down by three positions) and UpScan (move fife */
/* positions back upwards) for the destination data */

scanPattern			Down1Scan      is 8 steps [ 0, 1];
			Down3Scan      is 2 steps [ 0, 3];
			UpScan         is 1 step  [ 0,-5];

/* Set ShuffleScan to iterate Down3Scan and UpScan until the final */ 
/* position at [1,9] is reached */

			ShuffleScan    is begin
				while @[,<=9] begin
					Down3Scan;
					UpScan;
				end;
			end;

/* The scan windows SW1 and SW2 for source and destination have the size */
/* of 1x1 containing integers */

window			ShuffleW is
				SW1, SW2 [1:1, 1:1] of int;

/* The rALU configuration has to move the data from source to destination */

rALUsubnet		Exchange of ShuffleW is
				SW2[1,1] = SW1[1,1];

/* Activation part for the shuffle exchange */

begin

/* Specify the scan window group ShuffleW */
	with ShuffleW do begin

/* Activate the rALU configuration Exchange */
		activate Exchange;

/* Move the scan windows to their initial positions on DataMap1 and 2 */
		move SW1 to DataMap1 [1,1];
		move SW2 to DataMap2 [1,1];

/* Start the scans for source and destination parallel */
		parbegin
			Down1Scan[SW1];
			ShuffleScan[SW2];
		parend;
	end; /* of with ProductW do begin */
end; /* of activation part */

The second version differs beside the different scan pattern declaration also in the execution of the scan patterns. This can be specified as below. The rest of the program is the same as in the first example.


/* Start the scans for source (Down1Scan) and destination parallel */
/* The destination scan is a nested one with Down3Scan and OuterDown1Scan */

		parbegin
			Down1Scan[SW1];
			OuterDown1Scan(Down3Scan[SW2];)[SW2];
		parend;
In contrast to the first example is the two dimensional version. The scan pattern is easier, because it is a normal video scan that can be declared by hand or with the help of the external scan pattern VideoScanX. The self made version will look like the RowScan and ColScan in 5.2 Sharpen Algorithm on page 32.


/* Set the arrays for source (DataMap1) and destination (DataMap2) to 1x9 */

array			DataMap1 [1:1, 1:9] of int;
			DataMap2 [1:3, 1:3] of int;

/* Set scan pattern for DstScan (move down by one position) for source, */
/* SrcScan for the destination data */
/* DstScan is the external VideoScan that moves to the x-direction first */

scanPattern			SrcScan      is 8 steps [ 1, 0];
			DstScan      is external VideoScanX(3,3;1,1);

/* The scan windows SW1 and SW2 for source and destination have the size */
/* of 1x1 containing integers */

window			ShuffleW is
				SW1, SW2 [1:1, 1:1] of int;

/* The rALU configuration has to move the data from source to destination */

rALUsubnet		Exchange of ShuffleW is
				SW2[1,1] = SW1[1,1];

/* Activation part for the shuffle exchange */

begin

/* Specify the scan window group ShuffleW */
	with ShuffleW do begin

/* Activate the rALU configuration Exchange */
		activate Exchange;

/* Move the scan windows to their initial positions on DataMap1 and 2 */
		move SW1 to DataMap1 [1,1];
		move SW2 to DataMap2 [1,1];

/* Start the scans for source and destination parallel */
		parbegin
			SrcScan[SW1];
			DstScan[SW2];
		parend;
	end; /* of with ProductW do begin */
end; /* of activation part */

5.4 Fast Fourier Transformation

The algorithm has been described already in [2] comprehensively. Therefore only a sketch of the scan pattern and of the rALU configuration is given here. The program has been adjusted on the current version of the language MoPL-3.

Figure 5.4 shows the signal flow graph and the storage scheme, whereas in figure 5.5 the nested compound scan pattern is illustrated. Figure 5.6 describes the window adjustments: the 2 by 2 scan window SW1 is the input window, scan windows SW2 and SW3 are single word result windows.





/* Set array CGFFT to the size of 9x16 of floating numbers */
array			CGFFT [1:9, 1:16] of float;

/* Specify scan pattern for scan windows SW1 (HLScan, SP1), */
/* SW2 and SW3 (HLScan, SP23) according to figure 5.5 */
scanPattern			SP1		is 7 steps [0,2];
			SP23		is 7 steps [0,1];
			HLScan		is 3 steps [2,0];

/* Set scan window group ThreeW consisting of one 2x2 window and */
/* two 1x1 windows of floating numbers */
window			ThreeW is
			SW1		[1:2, 1:2] handle [1,1] of float;
			SW2, SW3		[1:1, 1:1] of float;

/* Set Data in SW2 and SW3 to appropriate values for FFT */
/* SW1, SW2 and SW3 are grouped togethter in scan window group ThreeW */
rALUsubnet FFT of ThreeW is
			SW2 [1,1] = SW1[1,1] + SW1[2,1] * SW1[1,2];
			SW3 [1,1] = SW1[1,1] - SW1[2,1] * SW1[1,2];

/* Activation part for the FFT */
begin

/* Specify the scan window group ThreeW */
	with ThreeW do begin

/* Activate the rALU configuration FFT */
		activate FFT;

/* Move the scan windows to their starting positions on the array CGFFT */
		move SW1 to CGFFT [0,0];
		move SW2 to CGFFT [2,0];
		move SW3 to CGFFT [2,8];

/* Start the scan with SP1 and SP23 as the innermost scan for SW1 */
/* SW2 and SW3 that run in parallel */
/* HLScan is the outer scan that moves all three scan windows */
		HLScan (parbegin
			SP1[SW1];
			SP23[SW2];
			SP23[SW3];
		parend;) [SW1, SW2, SW3];
	end; /* of with ThreeW do begin */
end; /* of activation part */
5.5 JPEG Algorithm

The JPEG algorithm has been described in other publications quite often. In this example only a part of the algorithm is treated, the so called run length coding.

This run length coding is the part of the complete JPEG algorithm, where the discrete cosine transformation (DCT) coeffitients are scanned via a zig-zag scan and stored in a linear order. The zig-zag scan runs through the 8 by 8 coefficion matrix from low frequencies to high frequencies, so that in the resultion linear sequence many small values (rounded to zero) are next to each other. These sequential zeros can be stored in a compressed way.

Alltogether there are two things to do. First of all the original data have to be scanned and evaluated with the zig-zag-scan. Furthermore the compressed data has to be written in a linear array.

If the actual data of the source equals zero, the number of zeros in the destination is increased by one, otherwise the destination scan window is moved to the next position and the data is written into the destination window.

With the help of these information one comes to the basic scan patterns as shown in figure 5.7. These scans build the triangle zig-zag scans UpLzigzagScan and LowRzigzagScan. With the help of that scan patterns the complete JPEGzigzagScan is build as shown in figure 5.8.

First to the declaration of the data arrays:

array			PixMap  [1:  8, 1:8] of short;
			ListMap [0:130, 1:1] of short;
Since the scan pattern of the zig-zag-scans is not a simple scan pattern, it has to be constructed out of several patterns. First of all the basic scan patterns have to be declared. These are EastScan, SouthScan, SouthWestScan and NorthEastScan. Altogether they build the basic scan patterns. With the help of the while statement and the until clause, the first upper half of the scan pattern can be constructed. The second half of the pattern has to be declared the same way. Please note, that it is not possible to transform the UpLzigzagScan to the LowRzigzagScan because while and until statements are within that scan. See also Scan Pattern on page 20.

The scan pattern ListScan is a data dependent one. The algorithm states, that if one more zero appears in the input data the destination data window should not move, instead the zero counter should be increased. Otherwise, the input data should be stored in the destination window and the destination window should do a step. The only thing that has do be done in the declaration of the scan pattern, is to add the dependent on localBranchFlag statement after the declaration of the scan pattern.

The localBranchFlag is only useful, if the appropriate rALU configuration has an if-Statement where in one branch the command set localBranchFlag appears.


/* Set basic scan patterns EastScan, SouthScan, SouthWestScan and */
/* NorthEastScan to their maximum step length */

scanPattern			EastScan 			is 1 step [ 1, 0];
			SouthScan			is 1 step [ 0, 1];
			SouthWestScan 	is 7 steps [-1, 1];
			NorthEastScan 	is 7 steps [ 1,-1];

/* Set UpLzigzagScan to the upper half of the JPEG scan with the help of */
/* while and until statements */

			UpLzigzagScan is
			begin
				while (@[<7,]) begin
					EastScan;
					SouthWestScan until @[<=1,];
					SouthScan;
					NorthEastScan until @[,>=1];
				end;
			end;

/* Set LowRzigzagScan to the lower half of the JPEG scan the same way as */
/* UpLzigzagScan */

			LowRzigzagScan is
			begin
				while (@[<8,<8]) begin
					NorthEastScan until @[<=8,];
					EastScan;
					SouthWestScan until @[,<=8];
					SouthScan;
				end;
			end;

/* Set JPEGzigzagScan compound of UpLzigzagScan, EastScan, SouthWestScan */ 
/* and LowRzigzagScan */

			JPEGzigzagScan is begin
				UpLzigzagScan;
				EastScan;
				SouthWestScan;
				EastScan;
				LowRzigzagScan
			end;

/* Set ListScan as a data dependent one */

			 ListScan is 63 steps [2,0] dependent on localBranchFlag;

The scan windows for Src and Dst are also easy to declare, they have the size of one by one and one by two respectively.


/* Set the scan window of the group MoveW to the size of 1x1 for the Src */
/* and 2x1 for the Dst windows containing short integers */

window			MoveW is
			Src [1:1, 1:1] of short;
			Dst [1:2, 1:1] of short;
The rALUsubnet Move consists of the assignment to the data in the destination window. If Src[1,1] equals zero, then the number of zeros is increased by one, otherwise the data is transferred into the destination and the localBranchFlag is set, so that the scan window of the destination can perform a step.


/* IF (Src[1,1] != 0) THEN */
/*  store value from Src[1,1] in Dst [2,1] */
/*  make step to the right */
/* ELSE */
/*  increase number of zeros in Dst[1,1] */
/*  and remain on current position */

rALUsubnet 			Move of MoveW is
			if (Src[1,1] != 0) then begin
				Dst[2,1] = Src[1,1];
				set localBranchFlag;
			end else
				Dst[1,1] = Dst[1,1] + 1;

The part of activation is as usual. First of all, the scan window group MoveW is selected. After that, the rALU is loaded with the Move rALUsubnet, Src and Dst are placed on their starting position in PixMap and ListMap respectively. Ultimately, the scan pattern JPEGzigzagscan and ListScan are executed in parallel.


/* Activation part for the JPEG algorithm */
begin

/* Specify the scan window group MoveW */
	with MoveW do begin

/* Activate the rALU configuration Move */
		activate Move;

/* Move the scan windows to their starting positions on PixMap for */
/* the Src window and ListMap for the window Dst */
		move Src to PixMap [1,1];
		move Dst to ListMap [0,1];

/* Start the JPEGzigzagScan for Src and the ListScan for Dst parallel */
		parbegin 
			JPEGzigzagScan [Src];
			ListScan[Dst];
		parend;
	end; /* of with MoveW do begin */
end; /* of activation part */

5.6 The n-Body Problem

This example shows the implementation of the n-body problem. For the mathematical concepts see [16]. The n-body problem requires a larger program than the last examples. It is not possible to declare variables in MoPL-3, thus it is helpful to use the #define construct to shorten the expressions. First the single array elements are defined, then r, r3 and r5 are declared as a macro to keep the resulting program as short as possible.

The data stored for an object are splitted in the three dimensions x, y and z. Every object contains the values for the actual positions x0, y0, z0. The actual speed is stored in the vector v. Whereas the current acceleration is stored in the vector a and the resulting acceleration is located in the vector ã. The element m0 stores the mass of the object and the number pos0 is the index in the data array.


#define G 9.81
#define x0   SW1[1,1]
#define y0   SW1[1,2]
#define z0   SW1[1,3]
#define vx0  SW1[2,1]
#define vy0  SW1[2,2]
#define vz0  SW1[2,3]
#define m0   SW1[3,1]
#define pos0 SW1[3,2]
#define ax0  SW1[4,1]
#define ay0  SW1[4,2]
#define az0  SW1[4,3]
#define aax0 SW1[5,1]
#define aay0 SW1[5,2]
#define aaz0 SW1[5,3]

#define x1   SW2[1,1]
#define y1   SW2[1,2]
#define z1   SW2[1,3]
#define vx1  SW2[2,1]
#define vy1  SW2[2,2]
#define vz1  SW2[2,3]
#define m1   SW2[3,1]
#define pos1 SW2[3,2]
#define ax1  SW2[4,1]
#define ay1  SW2[4,2]
#define az1  SW2[4,3]
#define aax1 SW2[5,1]
#define aay1 SW2[5,2]
#define aaz1 SW2[5,3]

#define r sqrt ((x0-x1)*(x0-x1) + (y0-y1)*(y0-y1) + (z0-z1)*(z0-z1))
#define r3 r * r * r
#define r5 r * r * r * r * r

After the definitions of the single elements, the memory area has to be declared using the array keyword. The array Data contains in this example 100 elements that have the shape of five by three each. This results in an array with the size of 500 by 3 of floats. Figure 5.9 shows the location of the single elements in the data array.


/* Set the array for 100 objects of the size of 5x3 */

array			Data		[1:500, 1:3] of float;

The algorithm has to perform two iterations over the the whole array. For each element the forces have to be calculated with every other element. Thus the scan pattern has to run twice over the data array. The OuterScan moves the pivot element, whereas InnerScan is responsible for the referencing elements. An illustration of the scan pattern is specified in figure 5.10.


/* Set the scan pattern to 99 steps to the right for both inner and */
/* outer scan. */

scanPattern			InnerScan		is 99 steps [5,0];
			OuterScan		is 99 steps [5,0];

For the nBody algorithm are two scan windows required. One for the pivot element and one for the reference element. They have both the same size, five by three elements of floats. The placement of the single elements in this field can be seen in figure 5.9.


/* The scan windows SW1 and SW2 have the size of 5x3 containing floats */

window			nBodyW is
			SW1, SW2 [1:5, 1:3] of float;

The rALU configuration is based upon the calculations specified in [16]. The theoretic background is specified there also. Figure 5.11 illustrates the rALU configuration.


/* Set the rALU configuration for the nBody problem */
/* according to figure 5.11 */

rALUsubnet 			nBody of nBodyW is begin
				if (pos1 != pos0) then 		begin
					ax1 = ax1 + G * (m1 / r3) * (x0 - x1);
					ay1 = ay1 + G * (m1 / r3) * (y0 - y1);
					az1 = az1 + G * (m1 / r3) * (z0 - z1);
					aax1 = aax1 + G * m1 * ((vx0 - vx1) / (r3)) 
					- (3.0 * (vx0 - vx1) * (x0 - x1) * (x0 - x1) / (r5));
					aay1 = aay1 + G * m1 * ((vy0 - vy1) / (r3)) 
					- (3.0 * (vy0 - vy1) * (y0 - y1) * (y0 - y1) / (r5));
					aaz1 = aaz1 + G * m1 * ((vz0 - vz1) / (r3)) 
					- (3.0 * (vz0 - vz1) * (z0 - z1) * (z0 - z1) / (r5));
				end;
			end;
The execution of the single components is rather easy. The scan window group nBodyW has to be seleected. Then the rALU has to be loaded with the rALU subnet nBody. After that the scan windows have to be placed and finally the scan pattern can be started.


/* Activation part for the nBody algorithm */
begin

/* Specify the scan window group nBodyW */
	with nBodyW do begin

/* Activate the rALU configuration nBody */
		activate nBody;

/* Move the scan windows SW1 and SW2 to their starting positions on Data */
		move SW1 to Data [1,1];
		move SW2 to Data [1,1];

/* Start the nested scan OuterScan(InnerScan) for SW2 and SW1 */
		OuterScan (InnerScan [SW2];) [SW1];
	end; /* of with nBodyW do begin */
end; /* of activation part */

6. Code Generation

This chapter describes the soft side of the Xputer. That includes all necessary information for the GAG, as well as for the rALU configuration. Furthermore the library concept of MoPL-3 is described.

6.1 MoM Assembler

The MoM Assembler is used to configure all parts of the data sequencer and the scan cache. These parts are described in the following:

6.1.1 Configuration of the MAG

The memory address generator is configured with instructions between the keywords ADDRScheme and ENDScheme. The instructions that are available for the memory address generator are stated in Generic Address Generator on page 10.

This configuration program has to load the scan cache. It has to take care that the scan cache gets the right data at the right time, and that the data that have to be written from the scan cache into the memory, are also transferred.

A reference to the MAG program is specified in the GAG configuration with the AIRport and AIRptr register. It is loaded with the other registers in the GAG.

6.1.2 Configuration of the GAG

The generic address generator needs a special set of registers, that configures it. In [29] a comprehensive overview over the GAG and the register configuration is given.

This register configuration is placed within the DEFINE and ENDDEFINE keywords. With thisconfiguration the special scan patterns are generated. It depends on the complexity of the scan pattern whether a GAG can completely handle a scan pattern or not. If it can not, some work has to be done by the instruction sequencer to get the scan pattern as desired.

This registers include the basic scan registers as specified in Figure 3.5 and some more configuration registers like Direction, MemBase, rALUSelect, OperationMode, WindowParam and others. These registers are described in [9] as well as in [8], where the assembler specific parts are described. The paper [30] describes also the purposes of the registers and gives a good overview of several classes of scan pattern that can be generated with the GAG.

To show an overview of the register configuration a small example for the shuffle exchange is given.

The address stepper steps the address register A from the initial value, which is copied from B or L, by dA increments until a final value, defined by the maximum step count or a threshold given by B, L, F, or C. Most non-trivial scan patterns are nested. That is the reason why B or L may be used to step through a master scan pattern, whereas A is used to step through a slave scan pattern. Each dimension of a scan pattern is described by seven parameters. A master stepper (B or L) computes within the outer loop the current base address, from which the inner loop steps A by dA. Two-dimensional scan patterns are composed by coordinated parallelism of the two one-dimensional stepper parts, one for the x address Ax, and one for the y address Ay.

. Parameters for Scan Patterns of the Shuffle Exchange of Figure 6.3
NameParameterMeaningValues in example
GAG 1GAG 2
FFloorAbsolute minimum for base11
DBBase stepStep width for base stepper01
B0Initial baseInitial value for base stepper11
DAAddress stepStep width for address stepper13
L0Initial limitInitial value for limit stepper99
DLLimit stepStep width for limit stepper00
CCeilingAbsolute maximum for limit99

Figure 6.3 illustrates GAG address generation by a shuffle exchange data transfer (size C-F is 9, destination step width dA is 3, see table 1) using two one by one scan windows: SW1 as read-only source and SW2 as write-only data sink. GAG1 steps SW1 through memory space. The shuffle, scan pattern of SW2, is carried out by GAG2. This shuffle is a nested scan pattern example, where the master scan pattern stepping B has B0 or dB as initial location, or, step width, respectively (see figure 6.3). The slave scan pattern uses current B (from the master), or dA, respectively.

6.1.3 Instruction Sequencer

The instruction sequencer is responsible for the control of the operation. That includes the loading of the GAG registers, the activation of them and the handling of outer loops that are not covered by the GAG. Furthermore the instruction sequencer has to load the rALU with the appropriate rALU program.

6.2 The ALE-X rALU Assembler

The programming language ALE-X has been designed to configure the rALU. ALE-X is similar to C, that is adapted to the requirements of the rALU. Beside some minor differences in declaration of the interface, the syntax is the same as used in MoPL-3 that is described in rALU Configuration Grammar on page 16. The complete syntax is described in [15]. To get a flexible design of the compiler a special hardware configuration file is used. This file is described in Hardware Configuration File on page 50.

ALE-X programs contain only simple instructions like assignments, if instructions, while and do-while loops. Furthermore simple expressions can be specified. Every operator can be executed by a single DPU. It is the task of the ALE-X assembler to optimize the spread the operations of the ALE-X source program over the rALU DPUs.

The ALE-X assembler includes a good optimizer. This optimization comprises the parallelisation of assignments, the detection and merging of common subexpressions and the removement of unused code. The optimized program is spread over the DPUs. Finally the optimal order for the data transfer is computed, to get an minimal delay on data transfer.

For more information about the use and the design of the assembler see [15].

Figure 6.4 shows an example ALE-X program file that configures the rALU to sum up the products of the elements of two vectors as used by the multiplication of two vectors.

6.3 Hardware Configuration File

The MoPL-3 compiler uses the same hardware configuration file as the ALE-X assembler. This file is used to specify the available types in the rALU and the available operations on that types. This file is necessary, because currently not all operations are available to the hardware. As soon as more operations are implemented, the MoPL-3 compiler and the ALE-X assembler can use that types and operations.

The hardware configuration file contains special hardware informations like the number of available DPUs and the delay time for the connection between two DPUs. This part of the hardware configuration is not interesting in this context.

The second part of the hardware configuration file contains a list of all operations that are available and the type of the parameters and the return value. For the ALE-X compiler some more information are needed like the delay time of the single operations.

Figure 6.5 shows a brief version of an example hardware configuration file. Please note, that some items that are only relevant for the ALE-X assembler are left out for clarity reasons. This file specifies the two types int and float that are both 16 bit wide. It also specifies some operations on these types. The type cond is a build-in one bit wide type that is used for conditional expressions.

6.4 MoPL-3 Library Concept

Within the CoDeX framework a special library with common used algorithms is used. This algorithms should be generic, that means that the size of the data for the algorithms should be variable. This is in contrast to the MoM. The Xputer has no ability to rearrange the size of data during runtime. That is the reason why a special precompiler has to be build that transforms the library routine into an intermediate MoPL-3 program that can be used by the MoPL compiler.

The library functions are placed in a special library file that is set by the environment variable MOPL-LIB. If no such variable is set the default value of "lib.mopl" in the current directory is used.

The syntax of MoPL-3 library functions is quite easy. Every function begins with the keyword FUNCTION followed by the name of the function and a list of parameter names within brackets. These parameters have no specific type, because a special feature of the MoPL-3 compiler is used, the #define statement. So the compiler takes care, that the actual values are passed through to the library function.

The only exception to this is, if a parameter is used in the library function within a formula as it is done in the following example for the scan pattern declaration, where the number of steps is equal to Size.. -1. This expression is evaluated by the library program and the parameter is treated as an integer value.

FUNCTION MatrixMatrixMultiplication (SizeAx, SizeAyBx, SizeBy)
/* MoPL-3 function for multiplying two matrices A and B with the size 
   SizeAx by SizeAyBx and SizeAyBx by SizeBy respectively resulting in a
   matrix C with the size of SizeAx by SizeBy 
*/
array			matrixA 			[1:SizeAx,   1:SizeAyBx];
			matrixB 			[1:SizeAyBx, 1:SizeBy];
			matrixC 			[1:SizeAx,   1:SizeBy];
scanPattern 			rightACscan			is SizeAx-1     steps [1,0];
			rightBscan			is SizeAyBx-1   steps [1,0];
			downAscan			is SizeAyBx-1   steps [0,1];
			downBCscan			is SizeBy-1     steps [0,1];
...
The program mopllib is this precompiler. It is called by the X-C compiler with a list of parameters. The first one is the name of the MoPL library function and the others are the actual parameters for the library function. The output of this compiler is a message to stdout with either an error text, if the library function is not found or the number of operators does not match, or a string containing the file name of the generated output.

The execution of the following line

mopllib MatrixMatrixMultiplication 7 5 6

results in the output

MatrixMatrixMultiplication_7_5_6.mopl

This file will look like the following:

/* Generic library function MatrixMatrixMultiplication_7_5_6.mopl */
#define SizeAx 7
#define SizeAyBx 5
#define SizeBy 6
/* MoPL-3 function for multiplying two matrices A and B with the size 
   SizeAx by SizeAyBx and SizeAyBx by SizeBy respectively resulting in a
   matrix C with the size of SizeAx by SizeBy 
*/
array			MatrixA			[1:SizeAx,   1:SizeAyBx] of int;
			MatrixB			[1:SizeAyBx, 1:SizeBy] of int;
			MatrixC			[1:SizeAx,   1:SizeBy] of int;
scanPattern			rightACsan			is 6 steps [1,0];
			rightBScan			is 4 steps [1,0];
			downAScan			is 4 steps [0,1];
			downBCScan			is 5 steps [0,1];
...
Please note, that the values of Size.. are evaluated for the number of steps for the scan patterns whereas the definition of the array remain the same as in the original file. The assignment to the actual values is done during the compile time with the help of the #define statement.

The evaluation for variables in the procompiler is limited. Expressions are evaluated from the left to the right and no parenthesis are allowed. This is no real limitation, because only simple operations are needed for the configuration. The operations plus, minus, multiplication and division are available. A formula has to start with a parameter. It is not allowed to start with a constant, because this would not be evaluated in the fulmar.

The X-C compiler does all the work for this configuration, so the user has no interaction with the MoM library precompiler.

6.5 Scan Pattern Library

The scan library has been implemented to close the gap between the scan patterns that can be described in MoPL-3 and the more powerful flexibility of the GAG in the Xputer hardware. There are some scans that can not be specified in an easy way in MoPL-3, like the spiral scan. Thus a library concept has been defined.

This library can be completed and optimized in the future to get a flexible way to achieve best performance on all Xputer revisions. The scan pattern library can become an integrated part of the language. This library can be used as a standard, like, for example, the stdio library in C with the functions printf and scanf.

The scan library is placed in a file called "lib.extscan" or in any other file that is specified in the SCAN-LIB environment variable. The format of the library is very simple. The name of the external scan pattern is stated in the first position of a line, thereafter the GAG register configurations occur, beginning with a tab or space. The configuration ends if an other external scan pattern is declared or if the end of the file is reached. There are always four variables that have to be set by the MoPL-3 program and can be used in the GAG configuration file. These variables are StepX and StepY for the step width in both directions and SizeX and SizeY for the number of steps.

DeltaScanX
	B0.X  = 0
	B0.Y  = 0
	dB.X  = 0
	dB.Y  = StepY
	C.X   = SizeX * StepX
	C.Y   = SizeY * StepY
	L0.X  = StepX
	L0.Y  = 0
	dL.X  = 0
	dL.Y  = 0
	F.X   = 255
	F.Y   = 255
	dA.X  = 1
	dA.Y  = 1
	X_START_ABS
	Y_START_ABS
	Direction1 = XPosLim, YPosLim, XUp, YUp
	TURN_NEVER
	Y_WAIT_X_EIL
DeltaScanY
...
The example above is for a scan pattern library function for the delta scan, which first runs in the x direction as used in the gauss algorithm. The scan pattern looks as figure 6.6 shows.

7. Design of the Compiler

This chapter specifies the design of the compiler. Its purpose is to give a fairly high-level overview of the compiler structure. The emphasize of this chapter is the description of the class relationship. For each class the most important implementation decisions will also be described. Furthermore some implementation details of the compiler and the library preprocessor are given.

This chapter is mostly interesting for people, that have to maintain the compiler and can be skipped without problems.

7.1 Overview

This compiler has been designed object oriented. The syntax tree builds itself out of a MoPL-3 program. The other modules help the syntax tree to build itself and to generate the appropriate code. The code generation is done the same way. Every object itself knows what kind of code it has to generate. In chapter 6 the structure of the files that have to be generated are specified.

In the following a closer look on the modules is given. Most of the design describes itself, so special notes are given only if necessary.

7.2 Basic Design

In the following part the relationships of the single classes are shown. The main program is placed around the class compiler. This class includes the scanner, the symbol table, the hardware definition and finally the syntax tree that generates itself from the input file. The input MoPL file is scanned and parsed during the instantiation of an object of the the class SyntaxTree, which is called during the generate_syntax_tree method of the compiler. That results to an object oriented syntax tree that is used for code generation.

The main program is very simple, because of the object oriented design. In the following a short version of the main procedure is shown. Note, that the file handling and argument parsing is left out for clarity reasons.


main (int argc, char **argv)
{
  char *filename;

  filename = argv[1];

  Compiler compiler(fileneme + ".mopl");
  compiler.generate_syntax_tree();
  compiler.check_ralu_semantic();
  compiler.generate_statement_code(filename);
  compiler.generate_array_code(filename);
  compiler.generate_ralu_code(filename);
  return (0);
};
The compiler generates several files necessary for further operations. In the beginning the main MoM assembler file is generated with the generate_statement_code method. Additionally the DMAP files are written with generate_array_code. Afterwards for every rALUsubnet an ALE-X file is produced with generate_ralu_code. If no errors occurs the compiler stops with return code 0. If an error has been detected, the return code is 1 and the compiler gives an appropriate message. Errors can occur during the generation of the syntax tree. During this pass, beside syntactical errors, also semantic errors are detected, especially if identifiers are used that are not declared. The check_ralu_semantic method can also create error messages. These errors arise, if operations are used that are not available on the hardware (e.g. multiplying an integer with a real giving an integer). An error halts the execution of the compiler, an error recovery function is not part of this work, but can be implemented rather easy.

The scanner covers the file handling of the input file and the macro expansion of the #define compiler directive.

The scanner delivers one token by another with the get_next_token and check_token functions. The method get_next_token reads the next available token from the input stream. The two check_token methods differ the following way: the first method reads the next token from the input stream and verifies it against the expected one, whereas the second takes the current token as an input parameter and verifies it against the expected.

The lookahead_token function returns the type of the next token in the stream, but remains the input stream as it is.

The token itself includes a number for each kind of token (e.g. integer, keyword scanPattern, semicolon) and the string itself, which is important for identifiers and operations like add operation (+, -) multiplication operation (*, /) and others. Furthermore each token has the information of its location in the source file for error handling. The line number and column position is needed to get the right place in the file. Please note that the location for tokens expanded by the define statement is the original position of the term to be expanded.

To get a very flexible design of the compiler, a special hardware file is used. It is the same as is used by the ALE-X compiler.

By instantiating an object of HardwareDefinition the appropriate file is parsed and the database is constructed. This database includes a list of the available types and, more important, a list of the available operations. The members of this list consist of a type for the operation and the types for the first and second operand and a resulting value. It is possible to have the same operation on the same operands giving a different return value.

The methods check_operation and get_operation_result_type are used by the check_ralu_semantic method to calculate the type of an expression and to verify it against the expected type e.g. for an assignment.

The symbol table is used to store symbols in a list. These symbols are identified by their names, for this reason ambiguities are avoided when the same name is used for two different objects, like for rALUsubnet and for scanWindow. The symbol table has three major functions: to check the type of an symbol, to verify the correctness of array accesses and finally to get an pointer to the symbol itself. With this function, it is possible to access objects that are not available from the object structure of the syntax tree. This is very important to the code generation.

The class Symbol only is used to store the necessary informations where not all elements are used the same time. The element range is used for arrays whereas the gagnr is for scan windows only.

The syntax tree consists of a declaration part and an activation part called scanstatementpart.

The declaration part includes lists for window declaration, rALU configuration, scan pattern declaration and array format. Because the elements are stored in discrete lists, the order in the source code gets lost. That is the reason why the semantic of the program is verified during parsing, e.g. rALU configurations use scan window groups that have to be declared before.

The semantic check for the rALU configuration is done in a second pass. During this pass it is verified that no operations are executed on the rALU that are not supported by the hardware.

The code generation of the declaration part handles only the ALE-X code, whereas the creation for the MoM assembler code is left for the scan statement part. From this class methods of the scan pattern declaration are called to specify the GAG register set and to generate the instruction sequencer code.

The array part is responsible for the mapping of the MoPL-3 memory to the local memory of the host. The array has to be declared in the DLL file. This DLL file is used to construct memory configuration files with the DLL compiler. For further informations about DLL see [5].

Scan patterns are the most important part in MoPL-3. The idea behind Xputers is to perform the same operation on a large amount of data following a special kind of scan pattern. This scan patterns are generated by the so called generic address generators. A GAG is very flexible, it can perform more than four million scan patterns [9]. This complexity has a major drawback. It is very hard to configure the GAG according to the source program.

As stated in chapter 8, improvements can be made by reducing the complexity of the GAGs and by increasing the interoperability of them.

The ScanPatternDeclaration class of a MoPL-3 program is a list of simple pattern declarations. This patterns consist of the pattern name that are executed from the activation part, the pattern specification and a flag for data dependent scans.

Pattern specification is a virtual class that is either a simple scan or a scan sequence, that are compound of other scan patterns respectively.

Window declarations specify the scan windows. These are the cache areas that move according to the scan pattern over the memory. A window group consists of all scan windows that are moved at the same time according to scan patterns that run explicitly or implicitly parallel. This is done with the parbegin and parend statements or by specifying more than one scan window for the scan pattern respectively.

In the current version of the MoM, there can be up to seven scan windows in a scan window group, because there are seven GAGs available.

Beside the constructing parse methods, the window declaration consists of two kind of methods. The first collects all names of scan windows in a scan window group. This method is important to check, if GAG register sets have to be declared for the stay scan pattern.

The other kind of method is for the ALE-X code generation. WindowSpecs are stored in the symbol table and access to that window specifications can be made directly through the name of the window. That's why the generate_alex_header and generate_alex_body methods need not to be available in the higher level objects.

The rALU configuration consists of several elements. First of all it includes the name of the rALU configuration that is used in the activation part. The scan window group name specifies the scan windows that can be accessed in the expressions. The flags for residentnes of the rALU is set during parsing, whereas the flag for local branches is set in the pass of semantic check. After that pass, the variable for the condition flag is set according to the value of local branch.

The rALU itself is configured by several, so called top statements. Top statements include do and while loops beside the other common statements. The SubStructure contains a virtual class SubStructureContent that is either an IfStructure, an Assignment, a List of SubStructure or the set localBranchFlag command.

In the SubStructure loops are not allowed, whereas nested if statements are possible. Please note, that the set localBranchFlag is simulated in the rALU configuration. This flag is true if and only if the condition of an if statement is calculated to be true. That's the reason why the then part of an if statement and the else part has to be swapped, if the set localBranchFlag is placed in the else part.

The class design of expressions are rather simple and are based upon the syntactical structure. The parsing method checks for the right syntax, the get_..._type functions generate a set of possible types for expressions and the generate_alex_code procedure writes the syntax tree in almost the same format into an ALE-X file.

An expression can be calculated to more than one resulting type. That's why the result of get_expression_type is an integer coded set. For example, an expression like (A[1,1] - B[1,1]) with A and B integer scan windows, can be used to assign the result to another integer or to be the boolean expression of a while statement.

The ScanStatementPart as specified in figure 7.14 is responsible for execution of MoPL-3 programs.

The single blocks specify a group of scan windows, which have to be used within them. If more than one block is used in an MoPL-3 program, they are executed in sequential order.

A statement block consists of the scan window group name and a list of scan statements. This ScanStatement can be an activation statement, a move statement, a scan pattern call or a parallel scan pattern call.

Activations specify activate, passivate and remove statements of rALU configurations. Move statements set the scan windows to specific positions from where they start with the scan. Pattern calls are either sequential or parallel.

If a scan pattern call is not nested any more, the GAG register set for that pattern is generated for every scan window which is attached to that pattern. The other generate methods construct the code for the instruction sequencer that is necessary to load the GAG and to emulate the nested scan patterns.

Objects that build basic classes are placed in the module Common. This includes a list of strings for scan window specifications. The list of variables represents a set of references to scan windows for the generation of address schemes.

The basic type DataType is used to represent simple types like real or unsigned integer for array declarations and scan windows.

Furthermore Point and Range are used in the syntax tree to store the information of coordinates and the range of coordinates. The numerical representation of Point and Range are called SPoint and SRange respectively.

7.3 Implementation Issues

On the following pages describes all necessary informations about the implementation are described. This includes a list of the files for the compiler, the makefile and of course the implementation of the generator program for library functions.

7.3.1 Files for the Compiler

The following list includes all the program files (including .cc for program file and .h for header files for the compiler and a short description of them in alphabetical order.

activation (.cc, .h)

This files include all the things for the activation part of the compiler like the move statements, the scan patter calls and the rALU activation.

array (.cc, .h)

The array files specify the configuration of the data memory and the code generation of the necessary assembler parts.

bool (.h)

Due to the lack of build in boolean type in C this header file specifies a type bool with two constants true and false.

common (.cc, .h)

This module implements some common used classes like Range and the scalar equivalent SRange. It also includes the handle_commandline function, that parses the command line parameter to the compiler.

compiler (.cc, .h)

Here the main function of the compiler is placed. The compiler class contains instances of the classes Scanner, HardwareDefinition, SyntaxTree and SymbolTable.

cString (.h)

This header file has been used to cover the compatibility problems of the UNIX version of g++ and the DOS Borland C++ compiler. This header includes String.h for the C++ string class and string.h for the old fashioned C string handling functions.

declaration (.cc, .h)

This files include the data for the declaration part. The declaration class contains lists of instances of array objects, scan pattern declaration objects, window declaration objects and objects containing the rALU configuration.

defines (.h)

In this file all kind of constants are defined, including constants for tokens and operation types.

error (.cc, .h)

The collection error includes several methods of error messages with different arguments for filename, position in the files and others.

hardware (.cc, .h)

The class hardware contains the information about the available hardware configuration. This information is extracted out of the hardware configuration file. Methods include check_operation and get_operation_result_type.

pix (.h)

The header file pix contains only the definition of a void pointer type called Pix. This type is required for the list class SLList.

ralu (.cc, .h)

This files include all the necessary informations about the rALU configuration. The complete rALU subnets are handled within this class.

scanner (.cc, .h)

This files contains the program for the scanner class. This class handles also the preprocessor that extracts the #define statements.

scanpattern (.cc, .h)

This files for the scan pattern include beside the parsing methods for the scan pattern also major tasks for the generation of the MoM assembler code. The activation part has access via the symbol table to the single scan pattern. There are methods available for generating all kind of code, like code for the instruction sequencer, GAG register set and many others. This files also handle the external scan pattern library.

scanwindow (.cc, .h)

The scan window informations are handled by this files. Beside the constructing parse method code generation for ALE-X is included.

sllist (.cc, .h)

This files include the code for a container class called SLList. This class is easy to maintain and offers a lot of useful methods.

This class is part of the GNU C++ library, written by Doug Lea. Please see sllist.h for the copyright note.

symboltable (.cc, .h)

The SymbolTable class is coded in this files. All identifiers of a MoPL-3 program are stored in the symbol table. Ranges of arrays, types of operands and other information are extracted out of the symbol table. Beside this, complete objects can be retrieved out of the symbol table by their appropriate name.

syntax (.cc, .h)

The syntax tree consisting of the declaration part and the activation part is coded within this files.

token (.h)

The class token is specified within this header file. Beside two constructors, the == operation is defined for tokens.

7.3.2 Makefile for the MoPL-3 Compiler

This makefile is attached to the document to give an overview of the dependencies for the single modules.

# makefile for the MoPL-3 compiler
# does complete project maintainance
CC = 		g++
CCL = 		g++ -c

DepCompiler = 	error.o common.o declaration.o activation.o array.o \
		ralu.o scanpattern.o scanwindow.o syntax.o symboltable.o \
		hardware.o sllist.o scanner.o

MoPL3comp: $(DepCompiler) compiler.o
	$(CCL) compiler.cc $(DepCompiler) -o MoPL3comp

common.o : common.cc common.h \
		sllist.o error.o scanner.o bool.h token.h
	$(CC) common.cc

declaration.o : declaration.cc declaration.h \
		sllist.o error.o scanner.o symboltable.o array.o \
		scanpattern.o scanwindow.o ralu.o \
		bool.h token.h
	$(CC) declaration.cc

activation.o : activation.cc activation.h \
		sllist.o error.o scanner.o symboltable.o declaration.o \
		bool.h token.h
	$(CC) activation.cc

array.o : array.cc array.h \
		sllist.o error.o scanner.o symboltable.o \
		scanwindow.o common.o \
		bool.h token.h
	$(CC) array.cc
	
ralu.o : ralu.cc ralu.h \
		sllist.o error.o scanner.o symboltable.o \
		scanwindow.o hardware.o \
		bool.h token.h
	$(CC) ralu.cc

scanpattern.o : scanpattern.cc scanpattern.h \
		sllist.o error.o scanner.o symboltable.o hardware.o \
		bool.h token.h
	$(CC) scanpattern.cc

scanwindow.o : scanwindow.cc scanwindow.h \
		sllist.o error.o common.o scanner.o symboltable.o \
		hardware.o \
		bool.h token.h
	$(CC) scanwindow.cc

syntax.o : syntax.cc syntax.h \
		sllist.o declaration.o activation.o \
		bool.h token.h
	$(CC) syntax.cc

symboltable.o : symboltable.cc symboltable.h \
		sllist.o error.o common.o \
		bool.h token.h cString.h defines.h
	$(CC) symboltable.cc

compiler.o : compiler.cc compiler.h \
		scanner.o syntax.o hardware.o symboltable.o
	$(CC) compiler.cc

hardware.o : hardware.cc hardware.h \
		sllist.o error.o \
		bool.h token.h cString.h defines.h
	$(CC) hardware.cc

sllist.o : sllist.cc sllist.h \
		pix.h
	$(CC) sllist.cc

scanner.o : scanner.cc scanner.h \
		error.o sllist.o\
		bool.h token.h cString.h defines.h
	$(CC) scanner.cc

error.o : error.cc error.h \
		cString.h token.h defines.h
	$(CC) error.cc

7.3.3 Design of the MoPL-3 Library Preprocessor

The MoPL-3 library preprocessor is designed in a rather easy way. There is only one class available the MoplLibFile that is placed in the file MoplLibIO. This class does all the handling so that the main program (placed in the file MoplLib.cc) is not long.

01:	#include <stdlib.h>
02:	#include "MoplLibIO.h"
03:	
04:	main (int argc, char **argv)
05:	{
06:		char *functionname;
07:		char *libfile;
08:		char filename [1000];
09:		int i;
10:	
11:		if ((libfile = getenv("MOPL-LIB")) == NULL) {
12:			libfile = "lib.mopl";
13:		}
14:		if (argc > 1)
15:			functionname = argv[1];
16:		else 
17:			 error_message("Wrong argument format !");
18:		strcpy (filename, functionname);
19:		for (i = 2; i < argc; i ++) {
20:			strcat(filename, "_");
21:			strcat(filename, argv[i]);
22:		}
23:		strcat(filename, ".mopl");
24:	
25:		MoplLibFile mopllibfile(libfile);
26:		mopllibfile.search_function(functionname);
27:		mopllibfile.write_function_header(filename, argc, argv);
28:		mopllibfile.write_function_body();
29:	
30:		cout << filename << endl;
31:		return (0);
32:	};

After the handling of the environment variable for the library file and the command line arguments for the function (lines 11-24) an instance of MoplLibFile is created in line 25 with the name of the library file as argument. The appropriate function is searched with the command in line 26. An error message appears if a function is not found in the library file. Please note that it is not possible to overload MoPL-3 library functions.

The function in line 27 writes the header of the file including the actual values for the parameter as #define statements. During this function the number of arguments is checked and an error message is given if the number of arguments does not fit to the specification in the library file. The resulting output filename stored in the variable filename has the shape of FunctionName_arg1_arg2_...argn.

The last method of MoplLibFile in line 28 writes the rest of the library function to the output file. This method calculates the result of compound values for the variables. That means a expression like "ImgWidth * ImgHeight - 1" will be replaced literally by "63" if the arguments of ImgWidth and ImgHeight are both 8. This is required, because the language of MoPL-3 allows no expressions in specification like array range or others. The precompiler of MoPL-3 also does no calculations of expressions because it is responsible only for macro expansions.

The class MoplLibFile has the following definition:

01:	class MoplLibFile {
02:		const char*     libfile;
03:		ifstream*       inf;
04:		ofstream*       outf;
05:		char            line [1000];
06:		string          newline;
07:		SLList <string> mopl_args;
08:		SLList <int>    mopl_vals;
09:		int             mopl_argc;
10:	
11:		void   set_parameters_for_line();
12:		int get_first_substitution(int currentpos, 
13:								int* substval, 
14:								int* substlen);
15:		int    is_operation(int* pos);
16:		int    is_number_or_defined(int orgpos, 
17:								int* pos,
18:								int* newval);
19:	public:
20:		MoplLibFile(const char* filename);
21:		void   search_function(const char* functionname);
22:		void   write_function_header(const char* filename, 
23:								int argc,
24:								char **argv);
25:		void   write_function_body();
26:	};
Beside the public functions described above (line 20-25), some private functions are declared.

int get_first_substitution(int currentpos, int* substval, int* substlen)

This function searches in the current line of the input file beginning at currentpos for the first occurrence of a parameter. It returns the index of the first parameter in the input line. The actual value for that parameter is set in substval, whereas the length of that string is set in substlen.

If no such string is found in the line, the constant NPOS is returned to indicate that no such point is found.

int is_number_or_defined(int orgpos, int* pos, int* newcval)

This method checks for a number or a parameter at the position orgpos in the current line of the input file. Blanks and tabulator characters are skipped, so a call by reference is required to set pos to the right value. The value of the number or the parameter is returned in newcval.

int is_operation(int* pos)

This method checks whether the character at the position pos in the current line of the input file is a character for a operation (+, *, -, /). Blanks and tabulator chars are skipped, so a call by reference is required to set pos to the right value.

void set_parameters_for_line()

This method scans a whole line in the input file for the occurrence of parameters. If no parameter is found, the string newline is only a copy of the input line. If a parameter is found, the values for that parameters are calculated and replaced in the variable newline.

8. Future Improvements

During the design of the MoPL-3 compiler some things came out that are either difficult to handle or that can be approved in another way. I want to give some ideas how these topics can be approved. The topic that are scratched here should be seen as a point for discussion.

8.1 GAG Organisation

It has shown that it is very hard to configure the GAG. One thing is the complexity of the GAG. It is very good to have more than four million scan pattern available, but it is not easy to generate the right register configuration set nor is it possible to generate an optimal register set. Therefore a lot of simulation and testing would have to be done during compilation time. If one register configuration per second can be verified, for the complete process of verification 46 days are required, so a simple simulation is not possible.

The registers of the existing GAG are not organized in a way, that is straightforward to generate the right configuration. This ends in a configuration for the GAG where less than ten percent of the power and flexibility of the GAGs are used. As a matter of fact, only the innermost linear scan of MoPL is implemented as GAG configuration, the rest is done with the help of the instruction sequencer.

On the other hand is it rather hard to make the configuration of a scan pattern by hand, because every one of the 25 registers has to be set in a special way and the use of some of the registers is not that easy to understand.

The other drawback of the current GAG is that there are no synchronisation mechanisms between the single GAGs available. For example, every GAG stops as soon as one finishes his work. It would be nice to have the possibility to run the GAGs independently of each other and what is more important to give the ability to start a GAG from another one. This can result in better performance and would eliminate some drawbacks, like the limitation on a maximum four nested loops with two for the x and y direction each.

8.2 rALU Configuration

One thing that can be improved is the description of the rALU configuration. For some problems, local variables could be helpful that the configuration for some applications is not so much effort as now. There is until now no concept prepared for this purpose. On the other hand that local variables may be implemented in MoPL-3 in a different kind as just with macro expansion.

8.3 Scanpattern Declaration in MoPL-3

The scan pattern that can be described in MoPL-3 are not as flexible as the MoM-3 hardware can support. One point is the lack of langauge constructs for delta scans as figure 8.1 shows.

This can be achived with the help of the new keywords decreasing and increasing. With the help of these statements the number of steps is decremented or incremented by one or more every time the scan pattern is executed within a loop.

To make this improvement available a simpler data sequencing unit has to be constructed. There are some ideas that will be implemented soon in the AG Hartenstein, so that the improvements can be made.

For example, the DeltaScan pattern may be implemented like this:

scanPattern			downscan  is 4 steps [0,1] decreasing by 1;
			rightscan is 3 steps [1,0];
...
rightscan(downscan[SW1];)[SW1];
With the help of the decreasing statement the number of steps is decremented by one every time the scan pattern downscan is executed within a loop.

That statement can lead to a version of the JPEGzigzagScan that has no until clauses and thus the upper half of the scan can be mirrored so that a complete scan occurs. See also figure 5.7 and figure 5.8.

scanPattern		 EastScan   is 1 step  [1,0];
		SouthWestScan is 1 step  [-1,1] increasing by 2;
		DiagonalScan  is 7 steps [-1,1];
		SouthScan     is 1 step  [0,1];
		NorthEastScan is 2 steps [1,-1] increasing by 2;
		UpLzigzagScan is begin
			do 3 times begin
				EastScan;
				SouthWestScan;
				SouthScan;
				NorthEastScan;
			end;
		end;
		MiddleScan is begin
			EastScan;
			DiagonalScan;
			EastScan;
		end;
		JPEGzigzagScan is begin
			UpLzigzagScan;
			MiddleScan;
			rotu(reverse(UpLzigzagScan));
		end;
The statement do 3 times is for the repeated execution of scan patterns. This is equivalent to a nested scan of 3 steps in the direction [0,0]. This scan can only be achieved in the new version of the GAG, because explicit stays can not be performed in the current revision.

Appendix A: - MoPL-3 Library
Appendix B: - Grammar Summary of MoPL-3

Master Thesis - 13 JAN 1997
[Next] [Contents] [Index]

Generated with Harlequin WebMaker

Please refere to the postscript files for a correct version of the script.

Recommendations to Harlequin for a correct implementaion of the Framemaker to HTML converter can be made at the address above