Saturday, September 05, 2015

Building from the Ground Up




In my previous post which turned out to be a rant about the ever increasing complexity of software, I decided to dig out a draft post from earlier this year which describes my continuing investigation into how the lower levels of computing languages are implemented.

The purpose is to create a set of low level tools which allow application code to be developed easily and interactively on a microprocessor or soft CPU with limited memory resources.

The microprocessor may be an ARM or a softcore running on a FPGA - and with no C compiler available how do you start to bootstrap the processor from first principles?

Booting from Scratch

In bygone generations of computer hardware, there would be a set of toggle switches which could be set to represent the binary instruction words. These words would be individually deposited into memory and the program counter advanced. A very short routine, consisting of only a couple of dozen instructions would be manually toggled into memory, allowing a paper tape to be loaded. Great ingenuity was applied to the hardware architecture so that these loader routines were short.

Daniel Bailey has devised a fun project - his C88 computer is a home-brew CPU on a FPGA which takes one back to the very early methods of booting a cpu from scratch. This blogpost and video explains it nicely. Daniel will be presenting his C88 at the forthcoming OSHcamp- at the end of September in Hebden Bridge.

As technoogy progressed, and IC memories became available, interchangeable eproms were used to allow the program code to be quickly modified and re-run.  However there might be a 20 minute delay between detecting a bug and reprogramming another eprom with the corrected code. To speed up the time needed to program and erase eproms - often a nonvolatile RAM,was used to emulate the eprom and hold the program code. I spent a summer in my late teens programming a Z80 dev board in this manner.

These days almost all microcontrollers have on chip flash memory, and some method for in circuit serial programming. (ICSP).  As a PC or laptop is invariably used  for program development and code compilation, virtually all of the development tools are hosted on the PC, and very few systems have the means to edit and recompile code on the target system itself.

There has to be another way

There is however another means of working, where a minimal interactive working environment can be loaded onto the target system, and this allows the development and debugging of programs on the target system itself. Very little additional support is needed other than a serial interface. The code can be developed in a text editor running on the microcontroller. This is no different to the way that interactive languages, such as BASIC were developed on home computers in the early to mid 1980s.

Interactive development is a completely different method of working compared to the usual edit, compile, load, test approach that has arisen out of the almost universal use of compiled languages - in particular C and its derivatives. It was only when PCs gained more RAM, disks and cpu speed that C compilation became practical.

Forth was one of the first interactive languages, - developed in the 1960s by Charles H. Moore, and it is Forth that influences my investigations here.

Forth might be compared to assembly language, as it deals with low level operations involving snippets of machine language that are threaded together to create some program function. However it is much more than just an assembly language, as it provides the means to edit, compile and assemble source code, in an extensible manner, using a very compact language.

Forth does away with much of the clutter of other languages, and gives the programmer direct access to the resources of the processor.  Forth offers high speed execution, especially on FPGA hardware designed to specifically match the primitive instruction set of the Forth language. Modern, low cost FPGAs can run a Forth engine at a clock frequency of some 50 to 200MHz. At this speed the interactive nature of the language becomes extremely fast, and the time taken to go around the edit-compile-exexute process is slashed to just edit-execute.  This is a significant saving in time, and ideas can be tested and rejected in the time otherwise spent compiling a large application.

Forth uses "words" to perform program functions.  A Forth word ultimately causes a series of machine language subroutines to be executed. In some respect - a Forth word may be likened to a LEGO block, which can be used with other blocks to make a more complicated structure. These structures may be further combined to build up the entire object - in our analogy - the program application.

Forth words are threaded together, like beads on a string, with the end of execution of one word, rapidly jumping to the beginning of the next.

Brad Rodriguez has an excellent article "Moving Forth" which explains how this threading process is achieved. Although written over 20 years ago - the fundamentals have not changed.

SIMPL

In an attempt to understand the fundamentals of Forth, I studied another threaded, interpreted language, SIMPL - the Serial Interpreted Minimal Programming Language.  I have blogged several times about SIMPL since May 2013 - when I first came across it.  Refer back to previous posts for details.

SIMPL takes some aspects of Forth, and packages them in an easy to understand manner which may be adapted to run on a variety of microprocessor platforms.  I first used SIMPL on a ATmega328 Arduino, but have subsequently ported it to ARM and FPGA soft core processor targets.  SIMPL is written in more or less standard C, which makes it portable between, processors, without the complexity of dealing with the machine language of each target processor. It is a stepping stone towards a real Forth.

SIMPL provides an easy way of sequencing blocks of code functions, and doing it in an interactive manner from the serial keyboard interface.  It's not so much a programming language - more a way of automating the access to routines - and done in an interactive manner.

As an example:

Suppose you have just written a function in C to plot a graphical character on the screen, and you wish see what it looks like - using a VGA graphics adaptor or a TFT LCD screen. In order to test this function, you need to supply it with parameters for the x,y position that you wish it to be plotted, and possibly the foreground and background colours. If you hardcoded these parameters, and got them wrong, then you would have to go back and edit and recompile until you got the effect you were looking for.

SIMPL provides the mechanism to enter these parameters and execute the function interactively. If you don't like the first result, you can change the parameters with a few key-strokes and test again. You have virtually eliminated the edit-recompile cycle from when you are developing new code functions.

The Mechanics of SIMPL

SIMPL exists as three short functions that provide a complete serial interpreter or shell, to accompany the rest of your program. By calling the SIMPL interpreter each time the main loop is executed provides the means to interact with the rest of the program.

The SIMPL interpreter consists of the following 3 functions executed sequentially within a loop.

txtRead() takes characters from the serial input and loads them into a buffer beginning at address buf - which has been allocated a length of 64 characters

txtChk()  looks at the first character to see if it is a colon :  If so it copies the input text string to a specific address - to form a "colon definition", an idea borrowed from Forth

txtEval(addr)  This is the interpreter - It is pointed to the start address of the text string and evaluates the string a character at a time. The SIMPL interpreter has a few rules on how it treats each character

If it finds numerical characters, it enumerates them to a 16bit integer that it places on the stack.

If the characters are small letters or punctuation characters it will execute a given function each time that character is encountered. For example, you may have allocated the character "g" to plot out your new graphic symbol, each time g is encountered.

Uppercase characters are used as addressing pointers - to point to additional character strings, either in Flash or RAM.  The colon definition allows you to compose a new sequence of characters, and access it every time the uppercase character is typed.

Because the text is always loaded into a buffer - it is executed at full speed - and not influenced by the serial baudrate.

In short, SIMPL provides an easy interactive means to call your code subroutines (functions) in a sequence from the keyboard, and provide serial printed output back to the serial terminal.

SIMPL is essentially providing a subroutine address lookup, and call to that subroutine, encoded into the ASCII value of a character.  Sequences of characters provide sequences of function calls. Numerical parameters may be passed to those subroutines using an elementary stack based method.

Extending SIMPL

SIMPL is not a fully blown language, but a method of automating the calling of functions according to a stored sequence of characters or tokens.  It treats every ASCII character as a new instruction, so you cannot use multiple character words such as ADD to convey a meaning - as this would be interpreted as sequence A followed by sequence D followed by sequence D and so on.  This is where it deviates from Forth.

It's control structures are limited to simple loops - controlled by a single down counting loop counter.

SIMPL is the stringing together of subroutines in Flash, that have been created from compiled C code. A 16 bit ADD in SIMPL is going to be somewhat slower than the same operation in a directly executed subroutine.

SIMPL does provide some pointers to how a more Forth like version could be implemented.

It already has the means to interpret an integer number and place it on a stack.

It also has the means to take a sequence of tokens - in this case ASCII characters, look them up in a jump table and retrieve the start address of a code subroutine, which it then executes, before returning to the interpreter to decode the next token. Each 7-bit token provides an index into a jump table of addresses, whom where the next word is executed.

To extend SIMPL, it will be necessary to create some additional text handling code, so that typed words can be processed, added to a dictionary and have a codefield associated with them.  How exactly this will be achieved will be the subject of a future post.









No comments: