Sunday, September 06, 2015

Extending SIMPL



In the earlier post "Building form the ground up"  I wrote about how Forth could be used to provide a low level development environment for an unfamiliar processor for which a C compiler was either not available, or not desirable to use.

The first task is to write a simple virtual machine for the target processor, and then use this VM to run your application code. This is similar to what Java bytecode does.

The virtual machine need only have a handful of instructions, including basic arithmetic, logical operations and the ability to access and manipulate memory. From these primitive instructions, all subsequent instructions may be synthesised.

This is the approach described in "The Elements of Computer Systems" (TECS) which is also known as "From Nand to Tetris".  The simple machine described, consists of little more than an ALU, a program counter, two registers and a memory interface capable of accessing a 64K x 16 bit address space.

Designing a Virtual Machine

If it is possible to code up a simple virtual machine, on any choice of microcontroller, or softcore CPU FPGA - then it becomes practical to program the virtual machine using the same machine language.

The virtual machine will be a stack machine, and will support a data stack and a return stack. These stack structures will probably be created in RAM on the virtual machine.  If this was then implemented in FPGA hardware, the stacks would be separate RAM blocks, with a means of fast access, without having to go through main memory.

The ALU has access to the top and second items of the stack, and will conduct arithmetic and logical operations on these elements, leaving  the result on the top of the stack. If operands are required from memory, they will first have to be loaded into the top and 2nd stack locations.

Assume that the virtual machine ALU can perform the following operations. The arithmetic and logical operations are performed on the top 2 members of the data stack, returning the result to the top.

ADD
SUB
MUL
DIV

Multiply and Divide can be synthesised from shift and add/sub - but are time consuming without additional specialist hardware.

AND       Bitwise AND
OR        
XOR
NEG

SLL       Shift Left  Logical
SRL       Shift Right Logical

@          Fetch a word from memory and place on the top of stack
!            Store a word from the top of stack to memory

BRZ      Branch if zero
JMP      Unconditional Jump
CALL    Call subroutine
RET      Return from subroutine

LIT       Put a literal number onto the stack

So with approximately 20 instructions, we have the basis of a stack machine that can do useful work.

The virtual machine to perform these operations can be written in a high level language - such as C, or actually implemented as a soft core on a FPGA.  One such stack processor that lends itself to this is James Bowman's J1 Forth CPU.  This compact, no thrills CPU can be defined in under 100 lines of C, or synthesised in FPGA hardware using under 200 lines of Verilog code.

The J1 is a practical CPU, designed for simplicity and optimised for high speed execution of stack based languages.  It offers most of the instructions outlined above and forms the basis of an exploration into stack based CPUs.  Initially it can be simulated on any processor in C code for experimentation and later blown into real high speed FPGA logic.

Even though the instruction set is very small, the 16 bit instruction word length does not lend itself to easy or memorable coding in machine language. It has several instruction fields, and these have to be populated with the correct bit pattern if we are to make any progress off the starting blocks. The instructions to however map very well onto the Forth language, but there are other alternatives which could be explored.

At the bare minimum, we need an assembler to synthesise the instruction words from the various fields, and once we have a list of 16bit hex instructions we need to load them into RAM and have the simulator step through them.

An assembler typically scans through a text file containing instruction mnemonics, such as ADD, JMP, CALL etc and converts these into the instruction opcodes.  It also looks for numerical constants, variables and addresses and assembles these into the instruction. Additionally, it looks for labels that identify subroutine addresses for jumps and includes these in the code.

A relatively simple program:   Mnemonics in - machine language out

Another Option

However there might be a different way to do this - in a more interactive nature - and this is where Forth or one of it's near cousins will come in. For the purpose of my exploration, I want to see if the SIMPL interpreter can be used as a means to perform this assembly step.

As we know, the SIMPL interpreter will read a series of characters from a buffer, one at a time, and execute code associated with that character.

So to add 2 numbers (say 45 and 63) in SIMPL and print out their sum

45 63+p

45 is interpreted as a literal to be pushed onto the stack
The space is used in SIMPL to push the stack down
63 can then be pushed onto the stack
+  adds the top two members of the stack leaving the result on the top
p prints the result to the serial terminal

This is almost Forth, except in SIMPL there is only a limited stack structure and the space is needed to command the stack to push down to accept another number.

Instead of executing the SIMPL instructions directly, we can hijack the SIMPL interpreter to synthesise the assembly language and the machine language needed for the virtual machine.

So   45 63+p  is translated to assembly language

LIT    45
LIT    63
ADD
CALL  PRINT

Where PRINT is a subroutine that outputs the top of stack contents as a printed integer to the serial port

But the translation to standard traditional assembly language with 3 letter mnemonics is an un-required middle step.  The SIMPL interpreter can easily produce the instruction fields and generate the machine language directly:

802D          LIT 45
803F           LIT 63
6200           ADD
5100           CALL 100  (PRINT is at 0x0100)

So by this process of translation, the SIMPL language is the Assembly Language - we can find enough of the SIMPL character set to map directly onto the J1 instruction set, and any of the other command characters (like p) will invoke calls to subroutines.

It might be remembered that SIMPL uses small letters and punctuation characters as primitives, numbers are enumerated as literals and capitals are reserved for the application words. This means that our little language can have approximately 60 primitive instructions, which is enough to do real work, yet takes up a fraction of the space used by the 170 words used in a typical Forth system. Less words, less typing, yet still more readable than assembly language or machine code.

So lets look at the primitive words and how it might map onto the assembly language

+     ADD
-      SUB
*      MUL
/       DIV
%     MOD
&     AND
|       OR
^      XOR
~      INV
#      LIT

@    FETCH
!      STORE

:      CALL
;       RET

<     LT
=     EQ
>     GT

j      JMP
z      JPZ

I find Stack manipulations are never easy to remember in Forth - words like DUP, OVER,  SWAP, NIP, DROP etc but as these are an essential part of the language, they need to have a syntax to code them into the assembly language.  Perhaps it might be possible to express the top two stack items as a two member array between parenthesis.

DUP    (1,1)
OVER (2,1)
SWAP (2,1)
NIP      (x,y)
DROP  (2,3)





          


Text Editor and Assembler

In order to start writing code for our new processor, we need a few simple tools to help us.  We need a means of entering and displaying text - usually a serial terminal interface, so that we can enter machine instructions into memory and have the virtual machine execute them.

In the early days of microprocessors development kits there was often a hexadecimal keypad, 7 segment display for address and data, and a monitor program.  This allowed  the machine instructions to be entered directly into memory, and then the means to run code from a certain address.  It was quite primitive, frustrating and subject to a lot of human typing mistakes. Nowadays, we can use the power and resources of a laptop to help get new systems up and running.

The first tool we need is a simple text editor.  This will take in text from the keyboard, display it in a terminal window and allow basic editing, listing and text file storage and retrieval.

Secondly, no-one wants to code in machine language, so a very basic assembler that converts text mnemonics and numbers - one a line by line basis - to machine language would also be an asset.

For this we need simple text parsing - and in traditional Forth this was done by looking for individual words or numbers separated by spaces.

Forth would take each new word and look it up in the dictionary for a match.  The match was based on the first three characters of the word, and it's length. This is quick to do and suitable for most purposes.

Numerical input, which includes integers up to 65535 were generally not stored in the dictionary, and would be converted from ASCII to integer and then placed on the stack.

The text editor and assembler can exist as one program - as they share a lot of features. Principally we will need the means to parse through the entered text looking for keywords and numbers.  As there are only 20 or so keywords required to implement the instruction set of the virtual machine, the task of programming these into the assembler is not too difficult.

I have chosen mnemonics that simplify this task - the first 2 characters are unique to each mnemonic, and no mnemonic is more than 4 characters long. We can combine the first two characters to produce a unique number, and then use this number in a series of Switch-Case statements to perform the action needed.

The 16 bit virtual machine can handle integer numbers up to 65535.  We need a means of detecting a number within the entered text string, and converting the ASCII characters to an integer. In a similar way to how we uniquely defined the mnemonics by the first 2 characters, we can do a quick test on the string to see if it is a number.

The assembler will convert our inputed mnemonics on a one by one basis into the machine instructions of the virtual machine.  More detail on how the assembler should operate is outlined in Chapter 6 of "TECS".

Sometimes it is easier working in binary or hexadecimal, so additional assembler directives, for example BIN, HEX and DEC, could be used to instruct the assembler which base to use to interpret the numerical strings.

Assemblers make use of other directives such as ORG, and labels, to refer to points in the listing. Assemblers can be single pass or two-pass.   A single pass assembler will require you to keep track of your own labels, which can be quite difficult if the assembly listing is rich in subroutines. So this is possibly one reason why Forth evolved as a language, it has it's roots in assembly language programming,  but the Forth system of "words", provided an efficient and automated means of keeping track of labels and subroutines - it had to, as Forth is composed almost entirely of subroutines.

Charles Moore, created Forth to be an efficient and automated way of creating a common programming environment on each of the wide variety of mainframe machines that he encountered  during the 1960s-70s.  His virtual machine had to be first coded in the native machine language of the processor,  but with the availability of C compilers, the VM can be coded in C.

Moore realised that the tasks of text editor, assembler, compiler and interactive interpreter could be bundled up into one efficient package, which he called Forth.  How exactly this was done was at first fairly obscure, and Moore initially kept much of it to himself, before branching out around 1970 and sharing the inner secrets of Forth within the Radio Astronomy community.

If the primitive assembler can only accept valid mnemonics and numbers, then any other unrecognised text string could be considered to be a new word. A word that cannot be found in the dictionary, is treated as a new definition and is composed from the primitive instructions.

This is similar to the use of a macro label within a (two pass) assembler listing. Once a macro has been encountered and given an address, it can be used again.

So the combination of a simple text editor and line assembler will help us to build up the various Forth word definitions from the primitives. Whilst this C program is not a complete Forth system, it is a tool that helps us create the Forth system.





No comments: