Saturday, May 18, 2013

Txtzyme - A minimal interpretive language and thoughts on simple extensions

Imagine a very simple programming language which could run on any microcontroller with the minimum of on chip resources. A language that could invoke complex instructions and one that could be used to exercise the basic I/O functions of the microcontroller with a simple serial command interface.

I have always been a proponent of minimalist programming languages and so it was nice to encounter something new this week. A language that consists of commands invoked by single ASCII characters which allows complex procedures to be assembled from simple arrays of characters.

It reminded me of the tiny Basics and tiny Forths that were written in the late 1970s for resource limited microcontrollers.

On Thursday evening, I attended the regular OSHUG meeting (Open Source Hardware Users Group) which is held each month at C4CC near Kings Cross.

The third speaker was Romilly Cocking, who presented a report on his quick2link project - a very simple, extensible protocol for controlling sensor/actuator networks.  quick2link provided the means to control an Arduino as the I/O subsystem of a Raspberry Pi project.

The Pi is limited in its I/O capabilities and so adding an Arduino to handle the I/O seems a sensible idea.  To allow the Pi to control the Arduino a very simple protocol is needed, and since the Arduino is essentially a microcontroller with a serial interface, a serial command interpreter seems appropriate.


quick2link was inspired by the work by Ward Cunningham (creator of the first wiki) and his minimalist txtzyme command interpreter, for controlling the I/O of simple microcontrollers.

I was intrigued by txtzyme and its simplicity and having downloaded the txtzyme interpreter for the Arduino, decided to have a play. As a simple command interpreter written in C, it compiles to just 3.6k, a very low overhead for even the smallest of today's microcontrollers.

Txtzyme allows the control of the Arduino I/O, using a simple interpreted language.

The commands are reduced to single lower case ASCII characters such as i for input and o for output. Each command can be given a single numerical parameter, limited to 65535 by the unsigned integer of the C interpreter.

The interpreter, a mere 90 lines of C code, parses a serial string, evaluating numerical characters and executing the alpha character commands.

Only a few commands are implemented, leaving a lot of scope for language extensions.

0-9 enter number
p print number
a-f select pin
i input
o output
m msec delay
u usec delay
{} repeat
k loop count
_ _ print words
s analog sample
v print version
h print help
t pulse width

Whilst appearing limited, this simple command set is essentially all that is needed to get a microcontroller project up and running.

The interpreter makes use of the Arduino functions, including, digitalWrite, digitalRead, Serial.Println, delay, delayMicroseconds and analogRead.  With these simple functions the interpreter can execute a string of commands, manipulate I/O and print results to the terminal emulator.

The hello world of the Arduino is to flash the LED. The following txtzyme string implements this efficiently:


This will flash the LED ten times for 1000mS on and 1000mS off.

Change the parameters a little and you can produce an audible tone to a speaker connected to an output pin.


A quick play with the command set showed that as well as simple port manipulation, txtzyme could toggle port lines at up to 47kHz, read and display ADC channels, create tones on a piezo speaker and perform simple time delays.

txtzyme uses a compact syntax, but is nevertheless quite human-readable. The following command produces a short beep to a speaker connected to an output port


400 is the loop counter - so perform the instructions enclosed between the {..} 400 times

1o   -  set the designated port pin high
200u  wait 200 microseconds
0o   -  set the designated port pin low
200u  wait 200 microseconds


I then started thinking about how txtzyme could be extended, to include new functions, and soon had simple integer arithmetical operations working  using +  -  * and /.  There are roughly 32 printable punctuation characters in ASCII, all of which could be utilised by adding a new case statement to the interpreter and writing the code to handle each function.

The lower case characters i o m u p k s are already used in the core interpreter - so it might be sensible to reserve all lower case characters for future extensions to the core interpreter. This would leave the upper case characters to be used for User defined functions.

Txtzyme is still a rather limited language, capable of concatenating blocks of code together and performing multiple loops through code blocks.  There needs to be a simple mechanism to pass numerical parameters to the code blocks and also build the if...then construct. This will clearly need some further thought in creating useful language structures whilst keeping the syntax very simple.

What txtzyme lacks is the ability to store and retrieve these simple routines. Once you have typed return at the end of the terminal text buffer the input characters are lost forever - unless you  copy them to the clipboard first. For the language to be extensible, there needs to be an easy way to manage the storage and retrieval of these text strings.

One solution might be to borrow from the Forth community, and create a "colon definition" - giving each routine a single upper case alpha character name. That would allow for 26 unique code snippets and an easy way to manage/execute them - solely by typing their name.

Let's call the routine above "Beep"  and assign it capital B as it's name.  We could use the :   ;  structure to define the body of the routine in the same way a Forth word is defined.  So the beep word definition becomes:


txtzyme allocates 64 bytes of RAM to its input buffer.  A very simple addressing scheme could use the ASCII value of B, to assign a start address to the RAM segment holding the body of the code.

On encountering the first colon : the interpreter needs to switch to a colon definition compiler mode, which interprets the next character B as the start address of the buffer to hold the colon definition.  It then starts storing the characters into this buffer until it encounters the closing semi-colon ;

Once this colon definition has been stored in RAM, any occurrence of the letter B is interpreted as a pointer to the buffer, from where to start executing the commands.

Whilst wasteful of unused RAM locations, this scheme would be easy to implement and allow simple routines to be stored . Regular used routines could be stored in flash as "primitives".

After a little head-scratching, I realised that I could store the txtzyme characters that perform a function in an array, and give the array a name: For example this produces a low note on a speaker connected to digital pin 6.  (50 cycles of 2mS on, 2mS off)

char A[64] = "6d50{1o2m0o2m}";

In order to execute the function, I just had to pass the pointer of the array i.e. A to the txtEval function and include this case statement within the routine that evaluates the serial input buffer.

      case 'A':  

I then wrote a couple of txts which produce a medium tone and a high tone, and assigned them to B and C.

char B[64] = "6d100{1om0o1m}";            // A medium tone beep
char C[64] = "6d100{1o500u0o500u}";     // A high tone beep

And then included their "names"  in the case statements of the interpreter

      case 'B':  

      case 'C':  

Now it is possible to type any combination of A B and C into the serial input buffer and have the tones played in order.

These capital characters also can be inserted into loops - so to play the sequence A B C  ten times, all you need is


I have created txt arrays to generate 6 musical notes A - F   - see my Github Gist for the code

In the same way that the native Arduino function calls are used to exercise the basic I/O, further library functions could be accessed and called by a single alpha character. For example  S for servo control and P for pwm could be invoked with a simple numerical parameter:

160S   moves the servo, connected to the nominated port pin, to 160 degrees.

128P   outputs a 50% duty cycle PWM waveform on the nominated port pin.

To be continued.


Unknown said...

for the i and o function should the code be the following? case 'i':
x = digitalRead(d);
case 'o':
digitalWrite(d, x%2);

Unknown said...

for the i and o function should the code be the following? case 'i':
x = digitalRead(d);
case 'o':
digitalWrite(d, x%2);

Ken Boak said...


Yes you are correct.

Ward Cunningham's original code used i to set the pin as input and then r to read the port.

You can see that here:

From the early experiments with Txtzyme, I extended it to have more functions, and the ability to define new macro commands - using the capital letters.

The new code was rechristened SIMPL (serial interpreted minimal programming language) and has been ported both in C to STM32, Arduino and MSP430.

It was also ported to MSP430 assembly language where a small image of less than 1000 bytes of code provided a basic communication and programming interface, conceived as an idea for a smart bootloader.

SIMPL has evolved over the years - and it is often the first tool I reach for when bringing up a new processor as new functions can be defined and executed just over a serial UART connection.

Many of the ideas in SMPL were inspired from Forth (only much smaller/compact) and of course Ward Cunningham's Txtzyme was what ignited the spark.

Please contact me if you need any further information.