Tiny Languages - for Tiny Boards! |
Inspired by these examples, I wish to show how Ward Cunningham's Txtzyme interpreted language can be easily modified adding more features and functionality.
The aim is to have a tiny, generic language that can be embedded into the bootloader of any common microcontroller - providing a minimal operating environment from start-up.
What will such a Tiny Language require in terms of features and functionality?
Wish List
Any small language must provide a minimum vocabulary to be useful. Here are some thoughts on the bare essentials:
Arithmetic and Logical Operations + - * / AND, OR , XOR, INV
Memory Operations - including loading, storing, saving to disk, and bootloading the flash or EEprom
Program flow control - decision making - branching (IF, THEN, ELSE, SWITCH-CASE etc)
Looping (DO WHILE, FOR- NEXT etc)
I/O commands (digital read/write analogue read, pwm, etc).
Timing, delays, and scheduling operations (mS_delay, uS_delay, millis() etc)
Peripheral support - eg UART, SRAM, uSD, keyboard, mouse, graphics etc.
We also need as a minimum of system support for the language kernel:
Keyboard for text entry,
Terminal output (serial UART routines)
Text Editing (line editor)
Hex Dump
Screen listing of code in RAM
With these facilities we can at least enter, edit and run code using a serial terminal application and get output text to screen.
At a higher level - for a stand alone computing environment we need permanent storage to disk and some tools to make life easier:
Assembler
Compiler
Disk operations, File Handling (on microSD card)
Graphical output to monitor.
SIMPL
Txtzyme offers a small, understandable, elegant and flexible core with UART communications, analog and digital input and output and delays and timing functions. It is almost the ideal platform on which to base other code. I have prepared a stripped down version of the Txtzyme interpreter kernel - which compiles to just 658 bytes. It is available here.
At the moment the cut down kernel doesn't do very much apart from allow a decimal number to be printed out using the "p" command. However, it represents about the minimum needed to support a UART interface, scan a text buffer and execute commands. Additional commands are added in the switch-case statement within the textEval() function.
This led me to a series of experiments in which the features of Txtzyme were extended - which resulted in SIMPL (Serial Interpreted Minimal Programming Language).
SIMPL has been ported to several microcontrollers - including MSP430, STM32F103, STM32F373, ' 407 and '746 - as well as an FPGA softcore processor called "ZPUino".
SIMPL offers an easy way to interact with hardware, to compose new functions and to schedule functions to be triggered either sequentially or at regular intervals. It is based on the Txtzyme core functionality - about 1Kbytes which gives a basic communications channel, text entry and display and the means to exercise common microcontroller hardware peripherals.
Adding Maths and Logic Operations
Txtzyme only used a single 16 bit integer variable x, as a control parameter for the coded functions. Whilst this was OK for running loops, reading the ADC channels and toggling I/O it was a little restrictive.
The first extension to Txtzyme was to create the means to have a second variable y. This immediately allows simple mathematical and logic operations to be performed.
Arithmetical + - * / %
Logical AND OR XOR NOT
Comparison < >
In Txtzyme, typing a number at the keyboard, with a serial terminal application causes the interpreter to decode the number's ascii string and place the number into the single parameter variable x. x can then be used to access I/0, set up loops and delays and other functions.
In SIMPL, there is a second parameter variable y, and this is used to hold a second operand.
Typing
456 123+p (note the space between 456 and 123)
This pushes 456 into x and then the space is SIMPLs method of moving x into y, so that the interpreter is free to accept the second number 123 into x. In the above example 123 is added to 456 in x and the result printed with p.
This arrangement is a kind of pseudo-stack of just 2 levels x and y, and note that unlike a true stack, y does not move up to occupy x when x is printed out. Perhaps it should be thought of a s a machine with 2 registers x and y, and "space" is equivalent to mov x,y.
Additional registers could be added but there is the complication of naming them.
With 2 registers we can now do simple maths, memory addressing and logical and comparison operations.
The addition of these arithmetical and logical operators adds a further 360 bytes to the kernel. This seems quite a lot for just 9 functions - and so at this point I began wondering whether the extensions to the kernel might be best done in AVR assembly language. You may recall that an AVR instruction takes up 2 bytes of flash memory - so my C code case statement selection structure is using on average 20 instructions (40 bytes) per command. (MUL, DIV and MOD will be more).
Whilst Txtzyme was purely a 16bit implementation, if x and y are cast as long integers, then all operations are extended to 32 bit. This however increases the code size by around 625 bytes (as the 32 bit mats operations all take up more code to implement in C).
It would be possible to make a more compact version of SIMPL by coding up the routines in AVR assembly language rather than C. Having studied the TinyForth code structure - the primitive execution structure is coded such that the commands occupy an average of 24 bytes (for 29 commands). This would result in a 40% reduction in codesize - so is well worth considering. The topic of coding SIMPL for assembly language plus an analysis of the SIMPL virtual machine will be covered in a later post. It might just be time to fire up Atmel Studio and start writing some AVR assembly code!
Memory Operations
In keeping with Forth parlance the basic memory operations are Fetch @ and Store ! These effectively are the equivalent of Peek and Poke - and operate on a 16 bit wordsize. With fetch and store there is now the means to take a number off the stack and place it anywhere in RAM, or conversely take a word from RAM and place it on the top of the stack. The exact location of the storage could be derived from a variable's name, or it could be in a set of easily accessed named registers, eg. R0, R1, R2.....
Text Input and Output
Txtzyme offered a very simple mechanism of loading up a character buffer with the inputted text from the UART serial communications. It would carry on filling the input buffer until a newline character was found, or the end of the buffer was reached. The input buffer essentially held the whole Txtzyme phrase, and on detecting the newline character would begin to interpret each character in turn, decoding it and calling the various functions associated with each character.
When I wrote the first draft of SIMPL, I realised that I could store the inputted characters into any buffer I wished, by redirecting the input characters to a new named buffer. I used uppercase ASCII letters for these buffers, named A through to Z. When I wanted to redirect the input characters to a new named buffer, all I had to do was use a colon : at the start of the line, followed by the naming character - eg M.
:M - the start of a new buffer called M
I chose for convenience to make all of these buffers the same fixed length (48 bytes), so that the start address of the buffer could easily be calculated from the ASCII code of the letter.
In this example M is code 77 in ASCII, and A is 65, so it's easy to allocate 48 byte buffers, just by multiplying up the ASCII value by the 48 byte length of the buffer, (plus an offset).
It becomes clear that these named buffers are in their own right small snippets of code - in effect subroutines, and in order to execute the subroutine it's just a case of typing it's naming letter.
Txtzyme subroutines are by their very nature quite short, and it was easy to fit almost everything that might conceivably wish to be done into a few tens of characters - hence it was found that 48 was a good size for the buffer - especially when the Arduino only has 2K of RAM to play in.
Forth readers will recognise this as a very crude means of creating a colon definition, storing the definitions executable code at a fixed address in memory - based on the ASCII name of the routine, and being able to effect a simple jump of the interpreter to that routine. It's a little like the restart RSTxx instructions on the Z80/8080. A few restart addresses that the program counter could be set to, in order to quickly call the most commonly used subroutines. In our case we have 26 to make full use of!
A "line printing" command to print out a string of the stored text from memory is the underscore character _ This is used as a pair surrounding the text message to be output for example
_This is a message_
We could include this with the previously defined "M" as follows
:M_This is a message_
The text handler would save this in the buffer associate with M, and every time the interpreter encountered the letter M it would print out the message
This is a message
Displaying and Editing Text
In a microcontroller system with only 2Kbytes of RAM, much of this can be taken up with the screen buffer - after all a 25 line x 80 column screen is 2000 bytes!
In a language such as Txtzyme and SIMPL, the source code has been tailored to consist of printable ASCII characters, with some degree of human readability. The system can easily perform a memory dump to the screen - which is effectively a program listing.
As the program consists of short subroutines threaded together, it would be possible to have a single stepping mode, where a cursor is highlighted as it steps through the current code. With a 115200 baud serial connection, the whole screen of text could be refreshed 5 times per second, more if only the active code is displayed.
Editing text is probably best done on a line by line basis. A line that requires editing can be brought up using the ? command and then copied into the input buffer, changes made and then pasted back to memory.
Some Forth systems use 1Kbyte blocks for storing and presenting their source code. A similar approach could be employed with SIMPL, with a block command to bring up a numbered block to the screen.
SIMPL is expressed as a series of tokens for compactness in memory and convenience of not having to scan for whole words - however there is no reason why it could be expanded to a more verbose format for screen listings - for example each lower case ASCII character could be expanded to a more easily read format. The table to do this would be retained in flash ROM - something that there is no real shortage of. For example:
a ANALOG
b BLOCK
c CALL
d DIGITAL
e END
f FOR
g GOTO
h HIGH
i INPUT
j JUMP
k COUNT
l LOW
m mS_DELAY
n NUMBER
o OUTPUT
p PRINT
q QUIT
r RETURN
s SAVE
t TIME
u uS_DELAY
v VARIABLE
w WHILE
x 1st operand
y 2nd operand
z SLEEP
All of these will fit into 8 characters - so the additional space needed for the verbose form is 26x8 = 208 bytes
In Summary - So Far
So we have a language largely represented by single character instructions which when decoded by the interpreter cause a jump to a subroutine that defines the program's action. These character command words may then be assembled into sequences or phrases and stored into named buffer spaces for execution. New phrases may be compiled and stored at fixed addresses for later interpretation and execution. There is the means to enter and retrieve text from the RAM either by loading and storing single characters or numbers, or by using the text handler and the string printing command.
In Part 3 we look at the program flow control structures like LOOPS, SWITCH-CASE and constructs made up from conditional jumps.