Monday, September 21, 2015

How SIMPL Can You Get?

In the last post, I explained how I had slimmed down the kernel of SIMPL - at the same time removing much of the Arduino specific code so that it would fit into 2Kbytes of Flash plus a lower RAM requirement.  This also makes it highly portable to other microcontrollers.

My intention was to be able to put an image of SIMPL onto any microcontroller target system that I happened to be working on at the time - and give myself a friendly, predictable environment with which to exercise the hardware. In some cases, SIMPL could even be loaded into the bootloader space of a processor - so that it was always accessible.

SIMPL fundamentally allows interaction with the microcontroller, because of it's interpreted nature. The interpreter is flexible enough to form the basis of a series of simple tools, such as cross assemblers, debuggers and simulators. It is, whatever you want it to be - you have absolute control over what action the cpu performs in response to your key strokes.

Kernel Functionality 

SIMPL communicates with a PC using a serial UART interface.  It can be driven from any terminal application.

It really only needs getchar() and putchar() routines that interface with the on-chip UART.

These together with a printnum() function which prints out a 16 bit unsigned number are all that is needed to communicate with the PC in a meaningful manner.  It's old-school, but it works - and easy to set up, on almost any microcontroller or SoC device.

SIMPL is a low overhead program - a kind of interactive tiny OS, that only takes a few Kbytes, yet provides all the means of accessing and controlling the micro.

A brief list of functionality.

The digital I/O is limited to the writing to or reading from a single I/O pin. In most cases this will be one that supports a LED.  The I/O functions can be extended to whatever is needed by the application - for example in one application - an LED chaser display I needed to write a 13 bit number to an array of LEDs each connected to an output pin of the Arduino.

Analogue Input (ADC) and output PWM functions may be ennabled if required - but these will add approximately a further 600 bytes to the kernel code.

The kernel uses the delay() and delayMicroseconds() functions to allow accurate timing of I/O operations. With these the microcontroller can generate pulse sequences (up to 100kHz on Arduino), generate musical tones, sound effects or animate LED displays.

As well as the functions that interact with the hardware peripherals, SIMPL also has a range of arithmetic and bitwise logic operators to allow simple integer maths and logical decision making.

There is a simple looping function which permits a block of code to be repeated - up to 32K times.

Recently added functions allow the printing of strings and the manipulation of ascii characters.

Extend-ability

On top of the 2K kernel core is some further code which allows the user to define their own functions and store them in RAM.  Up to 26 user functions can be defined under the current system.  It's not exactly Forth - but borrows a whole lot of ideas from that remarkable language.

The system could be extended to include SPI or I2C functions to exercise specific peripheral chips or access a microSD card for program storage.

One of my designs "WiNode" is an Arduino compatible target board but with 433MHz wireless module, external SRAM,  RTC, motor driver/ speaker driver, and microSD card. SMPL may be used to exercise and interact with all of these peripherals.

32bit Math Extensions

This was remarkably easy to implement. By re-type-ing the x and y variables to long - it forced all of the arithmetic routines to 32 bit.  Whilst this pushed the code size up by about 900 bytes - some of this was offset by rewriting the printnum() function as a 32 bit printong() and deleting  the original printnum().  The code now stands at 4826 bytes and can be found on this GitHub Gist

This means that SIMPL can do 32 bit integer maths - straight form the can.

Whatever Next?

SIMPL has been an ongoing project for over 2 years, and as it has developed - so have my C coding skills.  As the code becomes larger, things become easier  - as the switch from 16 bit to 32 bit integer maths has proven - it was literally a 10 minute hack.

I am very aware that SIMPL is not yet a fully fledged language - it can flap its wings and tweet a bit - but is not ready to leap out of the nest and fly. Perhaps I am a bad mother bird - too eager to experiment with new ideas rather than concentrate on the basics. Time will tell.

I have ported SIMPL to STM32Fxxx ARM microcontrollers and seen a 25X increase in speed.  Now their are ARMs that run at 240 and 300MHz  (Atmel SAM E7) that will give even more of a performance boost.

The final intention is to create a SIMPL virtual machine (SVM) that  can be hosted on nearly any micro - including FPGA soft core stack proessors - such a James Bowman's J1b. With these we hope to see a large leap in performance.

In the meanwhile, I still have an Arduino plugged into my laptop - as my preferred development platform - if it will run on Arduino - it will run on everything else a whole lot better!

Next Time - more uses for the SIMPL Interpreter.




A Closer Look at the SIMPL Interpreter

Keeping it SIMPL

Since May 2013, I have been slowly developing a tiny interpreted language that can be used to initialise and exercise hardware when developing with a new processor.

SIMPL is primarily intended to be a very low overhead language, requiring only a serial uart  (or bit banged serial) for communication to a PC hosted terminal program.

Commands are in plain, human readable ascii text - with an emphasis on being easy to remember.

SIMPL is based on Ward Cunningham's Txtzyme interpreter - originally for Arduino - but ported onto several other microcontrollers - as it is written mainly in C.

The kernel or SIMPL interpreter needs only a few resources:

2K bytes of program memory (Flash)
35 bytes of RAM
UART  getchar and putchar functions
microsecond delay
millisecond delay

On the Arduino these delays are provided by the delay() and delayMicroseconds() functions but can be provided with simple delay loops.

Once you have this 2K of code on-board, you can then start to add it more functionality - that is tailored to your particular application.

Slimming Down the Interpreter Kernel.

As originally written, Ward Cunningham's Txtzyme compiles to 5032bytes of flash and 209 bytes of RAM. (The exact number of compiled bytes may vary on what version of the Arduino IDE you are using).

As it made use of several of the high level functions available in Arduino - such as Serial.print, digitalWrite etc,  -  it was certainly not optimised for codesize.

I rewrote and enhanced the interpreter - so that now it fits into just short of 2048 bytes, and is written in more generic standard C for easier porting to other processors.

I have also added more functions including arithmetic, bitwise logic and memory operations.

I am sure that if the routines were handcoded in AVR assembly language, that further reductions in codesize could be achieved. However, I wanted a useful kernel that would fit in 2K and was easy to understand.

I have placed the SIMPL kernel here as a Github Gist.

Growing the Kernel

It has long been my intention to make SIMPL an extensible language, and so for this approach I have chosen to use some of the ideas used in Forth.

The kernel can easily be extended from some 30 basic functions to about 85, just by extending the switch/case statement that forms the basic subroutine calling mechanism at the heart of the kernel.

I keeping with Charles Moore's philosopy of "Problem Oriented Languages"  the kernel of SIMPL may be extended in whatever way needed for solving the problem, and should as such be considered to be a minimum common starting point - for any cpu.

Once the 2K core of the kernel was established, it was time to add in the extra functionality that allows users to add their own functions.  This is done in the spirit of Forth - but with certain limitations to keep the code size down.  However, with the added functionality - the code grew from 2Kbytes  to 3982 bytes.  The main difference is in the amount of RAM that is used - the extra code allocates a User RAM array of 1248 bytes.

If you would like to look at the code and try it out on an Arduino - I have created a Github Gist here.

If you are using a standard Arduino with the LED on Pin 13 change line 64 to:

int d = 13;          // d is used to denote the digital port pin for LED operation

As this is a work in progress - more details will emerge in a later post.

Monday, September 07, 2015

A Simple Assembler for the J1 Forth Processor

Foreword

I am currently exploring the use of soft core processors implemented in FPGAs, with a view to developing an image capture, process and display unit for a friend's scanning electron microscope.

Earlier this year I bought a Papilio Duo FPGA board and a computing shield from The Gadget Factory.  Some early experimentation with the ZPUino soft processor - programmed using the Arduino language, gave a taste for what could be achieved using soft core cpus and accompanying VGA hardware defined within the FPGA.

Very recently, James Bowman has released his J1b Forth processor implemented on a Papilio Duo and computing shield.  It is this processor core that ultimately I want to use.

However, there is quite a steep learning curve, not only to learn FPGA programming in Verilog and VHDL,  the J1 instruction set and the Forth language - to the point where I can program my intended application.

This is quite an ambitious journey for me - an opportunity to stretch my skill set and programming abilities. As with any long journey, it starts with the smallest steps.

This week I am looking at how the J1 instruction set works with the help of a simulator and an assembler, both written in C and ported onto either and Arduino or STM32F407.

With these simple tools, I hope to learn sufficient about how the J1 executes it's native machine language, to the point where I can implement a small Forth-like language.



Background

Inspired by Frank Carver's blog Raspberry Alpha Omega, I decided to dig up some of the work I did earlier in the year - to create a tiny Forth-like language that would run on virtually any processor.

My ambition is to have a language nucleus that resides in about 2K of memory which provides a means to debug and bootstrap an application with limited tools or resources. I imagine it as a common core, which can be accessed via a serial UART, and can be used right from the start of a project and form a foundation onto which an extendable application can be built.

Whilst the usual image of code development is typing into a text editor or IDE and then compiling before flashing the machine code into the microcontroller, my plan is to take a huge chapter out of Charles Moore's book and make my language interpreted and have the means to compile and edit code right there on the microcontroller itself.  Indeed very Forthlike.

However over the years and through the various ANSI standardisation processes, Forth has become large and bloated - and that was never what Chuck Moore intended or wanted.  So I am going to pick and choose from the characteristics of Forth, and come up with something very much simpler.

The plan is to have a compact language kernel which resides on the microcontroller - regardless of whether it is an AVR or ARM  or a specialist stack processor burned into an FPGA.  In each case, it will present me the same user interface and experience - for low level hacking or code development.

From a hardware developer's perspective, every microcontroller I work on needs to have the means to print to a terminal and waggle a port pin - right from the get-go.

However, this language need not just be for human interaction. As the commands are very compact, they lend themselves to being packetised, and sent from machine to machine by whatever appropriate communications channel - be it wireless, BLE, TCP/IP or 140 characters at a time via SMS or Twitter. It also allows a microcontroller such as that on the Raspberry Pi, to communicate with other task specific hardware - solely using a UART connection - the speed of control and interaction is not restricted to the speed of a few characters a second that a human can type.

Creating a Virtual Machine

For all this to work, we need to establish a virtual machine on the chosen microcontroller. The virtual machine could initially be coded in C, to run on the target, but later it can be created as a specialist soft core processor on a FPGA.  On the Arduino, the virtual machine codes into about 2Kbytes  - or 3K when you add the Serial.begin() function for UART output.

Once the virtual machine has been installed it will happily execute it's way through the memory on it's own, until it crashes or is reset. The challenge now becomes writing the low level inner interpreter application code in the assembly language of the virtual machine. This step is something that I will put off until I have generated a means of creating and assembling the language.

Txtzyme Revisited

To create an assembler, I am going to use the tiny Txtzyme interpreter, written by  Ward Cunningham,  which was the original inspiration for this project.  It allows very basic parsing of a text buffer and then performs one of a series of function calls depending on the character typed, or read from the buffer. Numerical characters are converted into an integer and placed in a variable x.

For a simple implementation of an assembler using txtzyme , let's consider that the instruction word consists if 4 fields

Class                 Class Field

Literal               0x8000
Jump                 0x0000
Jump if zero      0x2000
Call                   0x4000
ALU                  0x6000

For the first four of these - the assembler just needs to OR the class field with the literal number or the target address.  It may be worth ensuring that the literal is constrained to 15 bits and the target address is constrained to 13 bits.

We will use the following characters to define the instruction class

#  literal
j   unconditional jump
z  conditional jump when T=0
:   call



For the ALU instruction there are 4 more sub-fields to populate depending on the nature of the instruction.

1.   ALU op-code
2.   Transfer field
3.    Pointer field
4.    Return field

ALU op-code

The 2nd nibble of the instruction word controls the ALU.  It's 16 instructions are decoded thus:

0       t          NOP
1       n         COPY   (T=N)
2       +        ADD
3       &       AND
4       |         OR
5       ^        XOR
6       ~        INV
7       =        T = !(T == N)  Sets T to status of EQ flag
8       >        T= !(N < T)    Sets T to status of GT flag  
9       /        RShift
A      d        DEC     (T= T-1)
B      r      r-fetch
C      @       fetch
D      *        LShift
E      d        depth (shows dsp +1)
F      u        U<

So we have about 20 instructions - that fall into one of 5 categories

LIT   - load the included 15 bit literal onto the top of the stack
CALL  - call the subroutine at the enclosed 13 bit address
JMP     - non-conditional jump to the enclosed 13 bit address
JPZ      - conditional jump - only if the top=0
ALU    - ALU and stack operations

Literal Instructions take the form 8xxx   (in hex)
Jumps                                           0xxx or 1xxx
JPZ                                               2xxx or 3xxx
Calls                                             4xxx  or 5xxx
ALU                                             6xxx  or 7xxx if you include the "return"

So the plan is to adapt the txtzyme interpreter to convert text input into machine language in the form of the 16 bit instruction words.



The 3rd nibble of the instruction word insn controls the data flow from the stack to memory

N     Insn[7]  Top transfers to Next (2nd)
R    Insn[6]  Top transfers to Return
@   Insn[5]  Next transfers to address pointed by Top
_     Insn[4]  Not used



The lower nibble of the instruction word is used to control the incrementing or decrementing  of the data and return stack pointers dsp and  rsp.  Pushes to the stack involve incrementing the dsp, whilst popping from the stack means that the dsp needs to be decremented.  Some actions are stack neutral, and involve no net gain or loss in stack items.

The parentheses can be conveniently used to represent push and pop operations -  memorable that you start with a push  (left bracket) and end with a pop (right bracket)

(    push ds
)    pop ds

[   push rs
]   pop rs


ds field

1    dsp++
2    dsp--

rs field

1   rsp++
2   rsp--

The basic core of the assembler which accept the text input and  generates instructions as 16-bit hex words fits into under 200 lines of C.

Assembler Instruction set Summary

Implemented so far:

Arithmetic/Logical

t             NOP
n            Copy

+           ADD
&          AND
|                OR
^            XOR
~            INV

Comparison

<
=
>

r                 Right Shift
l                 left shift

d                T - 1  (Decrement)



Memory

@    Fetch
!      Store

Data Transfer

N         T-> N
R         T -> R
A         T-> A


Stack Ops

(       Push Data Stack
)       Pop Data Stack
[       Push Return Stack
]       Pop Return Stack


Literals













Sunday, September 06, 2015

Emulating a J1 Forth Processor on an Arduino

WTF!

Emulation is a useful technique - especially when you don't actually have the processor that you are writing code for.

In the Spring of 1975, 19 year old William Gates III did not possess an 8080 microcontroller, but he and friend Paul Allen had committed to writing a BASIC interpreter for the company supplying the new 8080 based Altair microcomputer.

Fortunately Paul Allen had written an emulator for the similar 8008, which ran on a PDP-10 mainframe at Harvard, and working nights for several weeks on the PDP-10, they managed to produce the first Microsoft BASIC product - and the rest is history.

Back in April, when I had a little spare time, I started to work on a program to emulate James Bowman's J1 Forth CPU - and it was the subject of my post "One Song to the Tune of Another".
At that time I had it running on a FPGA soft core - the ZPUino, and it was complete with a VGA display. I am now taking a step back to just isolate the J1 simulation part of that project, so that I can build it into a set of simple tools that I am developing.

Now as my thought processes are starting to converge, I thought I'd dust off the code and start to see how it will fit into my grand scheme for a stand alone code development system based on a J1 running on a FPGA.

James has put a lot of effort into writing his "swapforth" for the J1, but I am treating this as a learning exercise, so rather than use James's swapforth, I am setting about writing my own tiny language - it's the journey, not the destination I am interested in at the moment.

Not being as ambitious (precocious) as Bill Gates, I set my sights a little lower and in just 200 lines of code, I have a J1 emulator that runs on an Arduino.  The code I am using has been adapted for the Arduino from Samawati's J1 simulator on  GitHub.

Slow Forth

Not renowned for high speed or vast resources, the Arduino munches through the J1 code at a pedestrian  63,000 instructions per second. That's about  1600 times slower than an actual J1.
Slow, but nevertheless useful. I can now write snippets of assembly language to run on my "J1" and test them out.

The J1 machine code is stored in an array of 16 bit integers m[xxx ] set up in memory. As the ATmega328 only has 2K bytes of RAM, I kept the array size down to  768 words.

Here is the first J1 program - a simple counter

// Load up a simple count program into first 7 locations of the memory array m[ ]

   m[0] = 0x8020;      // LIT 0x20  (0x20 is the address of the count variable
   m[1] = 0x6C00;      // Fetch   [0020]
   m[2] = 0x8001;      // LIT 1   We are going to add 1
   m[3] = 0x6200;      // ADD
   m[4] = 0x8020;      // LIT 0x20
   m[5] = 0x6020;      // Store
   m[6] = 0x0000;      // JMP 0000

Translating these 7 instructions into Forth  we get

32 @ 1 + 32 !  followed by a jump back to the first instruction

Forth is clearly a little easier than assembly language, but note how the J1 instructions translate on a one to one basis into Forth, so validating the idea from yesterday's post about using the SIMPL interpreter to create assembly language - this is the next step.

Slightly Quicker Forth

Further experiments with a  STM32F407  Discovery board - and ARM Cortex M4 clocked at 168MHz showed that the emulator would run at approximately 700,000 J1 instructions per second - about 1% of the speed of the proposed hardware.




Exploring Forth for Low Level Hacking

Background

Forth is an interactive, low level language which shares a lot in common with machine code. It allows low level access to the processor and its resources and can therefore be quick and powerful - in the right hands.
It allows a degree of interaction which has now been lost in the higher level compiled languages, but for the right applications it provides all the flexibility needed.

The following videos illustrate some aspects of Forth when used for controlling hardware.

Open Firmware




It Was Twenty Years Ago Today.......

In the mid-1990s Chuck Moore, Jeff Fox and others worked towards forth computing engines that would achieve burst speeds of around 500MIPS.

Chuck Moore developed custom VLSI devices - a series of  processors where the machine language instructions were essentially Forth primitives.  These processors all used a minimal instruction set - and were known as MISC processors.

Dr C.H. Ting, had also shown with his eForth model, that a working Forth could be composed from just 31 Forth primitives, and that all other definitions could be assembled from this core set. Thus a processor with a 5 bit instruction length could potentially be used for Forth execution.  Dr. Ting explored this further with a series of chip designs - where 5-bit instructions were packed into 16 bit or 32 bit words - allowing 3 or 6 instructions to be fetched from memory at a time, which better suited the slower RAM access.

Keeping the speed up...

When Forth is implemented on a  register based load-store architecture- such as the ARM, the overheads of running the Forth inner interpreter - in particular NEXT,  means that around 10 machine instructions need to be executed in order to execute a Forth primitive. This suggests that an ARM clocked at 100MHz will only achieve around 10MIPS.

Forth requires the right architecture in the processor in order to be able to execute the Forth primitives efficiently - preferably as single cycle instructions.  The processor should have a stack-based architecture, and the machine instructions should be directly map-able to the Forth primitives for efficient executing. Using this approach allows a simple Forth processor to be designed as a soft-core cpu for a FPGA - and maintain a performance of around 50 to 100 million Forth instructions per second. (Forth MIPS).

Affordable FPGAs

Whilst much of this work was done about 20 years ago using custom VLSI chips, progressive improvements to FPGAs, falling memory prices and greater access to sophisticated design and simulation tools has allowed the creation of FPGA soft-core microcontrollers to be in the reach of the hobbyist.  Low cost FPGA dev-boards are available for the $50 to $80 price range.

There have however been a number of stack machine cpu designs developed over recent years, several of which have been implemented on a low cost FPGA.  Notably ZPUino - by Alvaro Lopez, and J1 - by James Bowman, although several others exist.

James Bowman's J1 design is of interest because it is close in architectural design to Chuck Moore's 1985 Novix NC4000, but much simpler because the data and return stacks are implemented in on-chip RAM. This gives it the potential for 100 Forth MIPS - when implemented in a Xilinx Spartan 3E - and described in under 200 (160) lines of Verilog code.

J1 is incorporated into the Gameduino Shield - a gaming -  graphics and sound generator for Arduino. Versions are also available from Olimex - which include PS2 keyboard connector and additional 32MB SDRAM for extended resolution - although Olimex leave you high and dry when it comes to implementing firmware to make full use of the extra 32Mb!

https://www.olimex.com/Products/Modules/Video/MOD-VGA-32MB/open-source-hardware

The J1 Processor Model

The J1 processor is simple enough that it may be modelled in about 100 lines of C code. I used this model available from ddb's Github Repository 

The J1 model is created from James Bowman's original documentation "J1: a small Forth CPU Core for FPGAs" and is very similar to the verilog code that defines the J1 implementation in hardware.

More documentation at James's J1 site .  As can be seen, the J1 has been used in a variety of projects including the Gameduino shield - which is a graphics engine in the form of an Arduino shield.

The J1 has just 5 categories of instruction coded up into a 16 bit instruction word:

Literal                           a 15 bit literal pushed onto the data stack
Jump                            Jump to a 13 bit target address
Conditional Jump          Jump if T is zero to a 13 bit target address
Call                              Call a subroutine at a 13 bit target address
ALU

The ALU uses a 4 bit field to determine the its action, and there are additional bit fields to control access to the stacks and memory.

T -> N        Copy T to Next
R -> PC      Put the return stack into the PC to get a free Return
N -> [T]     Store Next at the location addressed by T
T  -> R       Copy T to Return stack

Additionally there are two, 2-bit, bit fields that allow for the increment and decrement of the data stack pointer, and the return stack pointer - this enables items placed further down the stack to be accessed.

Simulation

The instruction set of any proposed processor may be simulated in software. Once a model of the various stacks, registers and memory has been devised, it becomes a relatively straightforward task to create a C program, with text output, that simulates the operation of the cpu and instruction execution. Whilst the output of the simulator is either text or graphics, the process can be further developed to the point where any processor can emulate the instruction set of another - but with a vast speed penalty.

Fortunately the relatively simple J1 processor may be quite easily simulated in C - even using an Arduino.

The model consists of a 512 word memory  (As an Arduino Uno only has 2K of on chip RAM)

Snippets of machine language are loaded into the RAM during the setup() function - for example

 m[0] = 0x6000;       // NOP
 m[1] = 0x8020;      // LIT 20
 m[2] = 0x8010;      // LIT 10
 m[3] = 0x6400;      // ADD
 m[4] = 0x6700;      // NEG
 m[5] = 0x6000;      // NOP
 m[6] = 0x0001;      // JMP 0001

In this trivial example two literals are loaded onto the  stack,  added together, negated and the whole process is repeated as an endless loop - by the unconditional jump back to the beginning. This is definitely not a particularly good example to illustrate Forth, but it's a good test case to show that the processor model is correctly fetching, decoding and executing code, and that the "alu" and pc are working properly together.

The information contained in the machine instructions contains the following

Numerical constants or literals.  These are 15 bits packaged into a word that has bit 15 set - i.e. 0x8xxx in hexadecimal.

Target Addresses - there are signed 13 bit addresses, which are used to force the processor to branch to a new subroutine address or jump to a new address. The jump in unconditional, but the branch may be conditional - in that the top of the stack needs to equal zero for the branch to be executed. This gives the processor a branch range of +/- 8192 addresses.

ALU Instructions.

The ALU has 16 possible instructions as controlled by a 4-bit field.  Instructions of the type 0x6X00 are alu - where the X is the 4 bit instruction.


Code is Code

It might be worth stating that the entire operation of the processor is controlled by the various fields coded within the instruction.  This is what makes machine language very powerful, and yet very easy to make mistakes.  A single mistake in a field might send your processor off into an unintended area of RAM, where it can misinterpret your stored data as a program, and then start indiscriminately writing to RAM.  Invariable this ends up as a system crash.

As writing in machine code has always been a thankless task and prone to mistakes, it is best to spend time writing an assembler to help "assemble" programs from the processor's instruction set.

Assemblers use human readable mnemonics such as ADD, OR, JMP and allow numbers to be entered in decimal or hex. The assembler will use a text file which contains the source code, and which can be edited using a text editor. This can then be processed by the assembler to produce a binary or hex file that may then be loaded into the RAM of the processor.

Assembly language is the first layer of abstraction above the processor's own machine language.  As a tool it makes programming simpler, faster and less prone to mistakes.

These tools first stared becoming available in the early-1980s. Early 8-bit home-micros often had an assembler/disassembler available as part of it's toolkit.

An excellent reference book on Assemblers by David Salomon.

Forth as an Assembly Language

In the early 1960's, Charles Moore - the creator of Forth, realised that there may be a better way of writing programs, than the traditional assembler or high level language compiler method.

He knew that any program consisted of small snippets of code, each performing some small function within the program.  These functions and routines would be stitched together with calls and jumps to form the structure of the program.

He came up with the concept of the  Forth word,  where the word is the name of such a function - for example SQUARE.

Running on the processor was a small interpreter program, which could take the text input and compile it into executable machine code.

The word SQUARE could be written at the keyboard, or typed into a text file, and every time it was encountered it would perform the function of calculating the square of a number.

For this to work, SQUARE had to be created using the colon definition method of defining new words - which is written like this:

: SQUARE DUP  *  ;

: This colon is the word that tells the interpreter that this is a new definition
SQUARE this is the name of our new word, and that will be put into the dictionary
DUP is a forth word that duplicates the top word on the stack
* multiplies the top two entries on the stack, leaving the product on the stack
;  Semi-colon  - this denotes the end of the definition and a return to the inner interpreter

For a much fuller explanation of how this works - have a Read of Brad Rodriguez' excellent article "Moving Forth"

Suffice to say, that the Forth system provides the assembly, compilation and run-time execution environment needed for a self contained system, and it does it in a user interactive manner.

This video shows a typical Forth work session.

N.I.G.E Machine


In another post, I describe my project to combine a simulation of the J1 Processor with a set of simple graphical tools to allow assembly, disassembly and memory viewing.

A Graphical User Interface for Low Level Hacking

Disassember Window
Over the last couple of days - spare time permitting, I have written a simple application to assist in the development of code for a FPGA soft core processor.

So far, this consists of a memory view, a register view, stacks and a disassembler window.  The windows into memory are animated such that the actions of the instruction set on memory and registers may be viewed whilst single-stepping through the code.

The novel thing about this simple application, is that it has been written in Arduino C++ code, and once compiled, it runs on a ZPUino softcore processor hosted on a FPGA.  Additionally, the hardware which generates the 800x600 VGA display is also hosted on the FPGA. So we have a complete computer system consisting of cpu, memory and video generation hardware supplied on the Papilio Duo FPGA board.

The first part of this exercise was to get the graphical parts of the user interface working.  These consist of the hex memory dump window and the various stack, registers and disassemnler windows.

Now, whilst the ZPUino is itself a stack based processor, and these tools will eventually be used to examine it's operation, it was decided that initially I would use the ZPUino to emulate an even simpler stack processor.  The candidate is James Bowman's J1 Forth Processor (also available as a softcore for FPGA use) - which has the advantage of a very small instruction set, and a processor behaviour that is easily modelled in C code.

This may appear a somewhat round-about route but was chosen for the following reasons:

1.  The ZPUino can be programmed in "Arduino code" using DesignLab - the Papilio Duo IDE
2.  The ZPUino interfaces in hardware to the 800x600 VGA engine
3.  Adafruit's GFX graphics library has been ported to ZPUino
4.  A compact C model, and sufficient documentation exist for the J1 processor
5.  I wanted to understand how the J1 works, and what it's limitations are
6.  This is a programming project that meets my elementary coding skills

So my approach is to make use of the tools available.  James Bowman is working on an implementation of the J1 to run specifically on the Papillio Duo board, and make use of its 2Mbyte of SRAM. Whether he will develop it to the point where a VGA engine is supported is unknown - so for the moment I have to be content emulating the J1 with the ZPUino in C, with the heavy burden of the GFX library calls.

If we can develop sufficient momentum, then there might be a srong case to put a fast Forth soft core on a FPGA with VGA. This however is beyond my coding skills - but on my wish list for the future.

Disassembler Window

This does a very simple disassembly on the instructions in memory. The jump, branch, call and ALU instructions are decode to their mnemonics for easier reading. The animated display shows the instructions highlighted in cyan as they are executed by the processor emulator.

More Interaction

So far only the graphical code has been prototyped - just enough to see the animation of the J1 processor emulation.  For complete user interaction, it will require more code,  in particular that to support keyboard, mouse and a text editor window.

I have ordered a Classic Computing shield for the Papilio Duo from Gadget Factory in Denver. This includes sockets for PS2 keyboard and mouse, VGA output, microSD card and a pair of Atari style joystick connectors. This will allow keyboard and mouse interaction to be developed, plus program and data storage on the microSD card.  The thought had occurred to me that the Atari ports might be useful to accept switch presses from some form of custom keypad - a bit like Chuck Moore uses with his OKAD and colorForth environment.

Text Editor

The existing graphical layout allows for a text window of about 90 character columns by 75 rows.  This should be sufficient for 80 column mode plus a few clickable buttons.  The mouse will be used extensively for click and drag type operations - so a routine that links mouse position to the position of objects on the screen will be central to the user interaction.

As most coding languages are text based - the efficient manipulation of text leads to high productivity whilst programming. The use of colour text and highlighting of selected areas will enhance the user experience. The text window will additionally be used for serial output and command line input, and the use of the microSD card will allow source code to be saved and retrieved from "disk".

Assembler and Compiler

The J1 is a "Forth" processor, in as much that it is stack based, and almost all of it's instructions are Forth primitives. This allows it to execute the Forth language efficiently.  However, a modern Forth consists of about 200 definitions, and these have to be encoded in the native instruction set of the processor.  Fortunately, this is something that Bowman, and others have already done.







Extending SIMPL



In the earlier post "Building form the ground up"  I wrote about how Forth could be used to provide a low level development environment for an unfamiliar processor for which a C compiler was either not available, or not desirable to use.

The first task is to write a simple virtual machine for the target processor, and then use this VM to run your application code. This is similar to what Java bytecode does.

The virtual machine need only have a handful of instructions, including basic arithmetic, logical operations and the ability to access and manipulate memory. From these primitive instructions, all subsequent instructions may be synthesised.

This is the approach described in "The Elements of Computer Systems" (TECS) which is also known as "From Nand to Tetris".  The simple machine described, consists of little more than an ALU, a program counter, two registers and a memory interface capable of accessing a 64K x 16 bit address space.

Designing a Virtual Machine

If it is possible to code up a simple virtual machine, on any choice of microcontroller, or softcore CPU FPGA - then it becomes practical to program the virtual machine using the same machine language.

The virtual machine will be a stack machine, and will support a data stack and a return stack. These stack structures will probably be created in RAM on the virtual machine.  If this was then implemented in FPGA hardware, the stacks would be separate RAM blocks, with a means of fast access, without having to go through main memory.

The ALU has access to the top and second items of the stack, and will conduct arithmetic and logical operations on these elements, leaving  the result on the top of the stack. If operands are required from memory, they will first have to be loaded into the top and 2nd stack locations.

Assume that the virtual machine ALU can perform the following operations. The arithmetic and logical operations are performed on the top 2 members of the data stack, returning the result to the top.

ADD
SUB
MUL
DIV

Multiply and Divide can be synthesised from shift and add/sub - but are time consuming without additional specialist hardware.

AND       Bitwise AND
OR        
XOR
NEG

SLL       Shift Left  Logical
SRL       Shift Right Logical

@          Fetch a word from memory and place on the top of stack
!            Store a word from the top of stack to memory

BRZ      Branch if zero
JMP      Unconditional Jump
CALL    Call subroutine
RET      Return from subroutine

LIT       Put a literal number onto the stack

So with approximately 20 instructions, we have the basis of a stack machine that can do useful work.

The virtual machine to perform these operations can be written in a high level language - such as C, or actually implemented as a soft core on a FPGA.  One such stack processor that lends itself to this is James Bowman's J1 Forth CPU.  This compact, no thrills CPU can be defined in under 100 lines of C, or synthesised in FPGA hardware using under 200 lines of Verilog code.

The J1 is a practical CPU, designed for simplicity and optimised for high speed execution of stack based languages.  It offers most of the instructions outlined above and forms the basis of an exploration into stack based CPUs.  Initially it can be simulated on any processor in C code for experimentation and later blown into real high speed FPGA logic.

Even though the instruction set is very small, the 16 bit instruction word length does not lend itself to easy or memorable coding in machine language. It has several instruction fields, and these have to be populated with the correct bit pattern if we are to make any progress off the starting blocks. The instructions to however map very well onto the Forth language, but there are other alternatives which could be explored.

At the bare minimum, we need an assembler to synthesise the instruction words from the various fields, and once we have a list of 16bit hex instructions we need to load them into RAM and have the simulator step through them.

An assembler typically scans through a text file containing instruction mnemonics, such as ADD, JMP, CALL etc and converts these into the instruction opcodes.  It also looks for numerical constants, variables and addresses and assembles these into the instruction. Additionally, it looks for labels that identify subroutine addresses for jumps and includes these in the code.

A relatively simple program:   Mnemonics in - machine language out

Another Option

However there might be a different way to do this - in a more interactive nature - and this is where Forth or one of it's near cousins will come in. For the purpose of my exploration, I want to see if the SIMPL interpreter can be used as a means to perform this assembly step.

As we know, the SIMPL interpreter will read a series of characters from a buffer, one at a time, and execute code associated with that character.

So to add 2 numbers (say 45 and 63) in SIMPL and print out their sum

45 63+p

45 is interpreted as a literal to be pushed onto the stack
The space is used in SIMPL to push the stack down
63 can then be pushed onto the stack
+  adds the top two members of the stack leaving the result on the top
p prints the result to the serial terminal

This is almost Forth, except in SIMPL there is only a limited stack structure and the space is needed to command the stack to push down to accept another number.

Instead of executing the SIMPL instructions directly, we can hijack the SIMPL interpreter to synthesise the assembly language and the machine language needed for the virtual machine.

So   45 63+p  is translated to assembly language

LIT    45
LIT    63
ADD
CALL  PRINT

Where PRINT is a subroutine that outputs the top of stack contents as a printed integer to the serial port

But the translation to standard traditional assembly language with 3 letter mnemonics is an un-required middle step.  The SIMPL interpreter can easily produce the instruction fields and generate the machine language directly:

802D          LIT 45
803F           LIT 63
6200           ADD
5100           CALL 100  (PRINT is at 0x0100)

So by this process of translation, the SIMPL language is the Assembly Language - we can find enough of the SIMPL character set to map directly onto the J1 instruction set, and any of the other command characters (like p) will invoke calls to subroutines.

It might be remembered that SIMPL uses small letters and punctuation characters as primitives, numbers are enumerated as literals and capitals are reserved for the application words. This means that our little language can have approximately 60 primitive instructions, which is enough to do real work, yet takes up a fraction of the space used by the 170 words used in a typical Forth system. Less words, less typing, yet still more readable than assembly language or machine code.

So lets look at the primitive words and how it might map onto the assembly language

+     ADD
-      SUB
*      MUL
/       DIV
%     MOD
&     AND
|       OR
^      XOR
~      INV
#      LIT

@    FETCH
!      STORE

:      CALL
;       RET

<     LT
=     EQ
>     GT

j      JMP
z      JPZ

I find Stack manipulations are never easy to remember in Forth - words like DUP, OVER,  SWAP, NIP, DROP etc but as these are an essential part of the language, they need to have a syntax to code them into the assembly language.  Perhaps it might be possible to express the top two stack items as a two member array between parenthesis.

DUP    (1,1)
OVER (2,1)
SWAP (2,1)
NIP      (x,y)
DROP  (2,3)





          


Text Editor and Assembler

In order to start writing code for our new processor, we need a few simple tools to help us.  We need a means of entering and displaying text - usually a serial terminal interface, so that we can enter machine instructions into memory and have the virtual machine execute them.

In the early days of microprocessors development kits there was often a hexadecimal keypad, 7 segment display for address and data, and a monitor program.  This allowed  the machine instructions to be entered directly into memory, and then the means to run code from a certain address.  It was quite primitive, frustrating and subject to a lot of human typing mistakes. Nowadays, we can use the power and resources of a laptop to help get new systems up and running.

The first tool we need is a simple text editor.  This will take in text from the keyboard, display it in a terminal window and allow basic editing, listing and text file storage and retrieval.

Secondly, no-one wants to code in machine language, so a very basic assembler that converts text mnemonics and numbers - one a line by line basis - to machine language would also be an asset.

For this we need simple text parsing - and in traditional Forth this was done by looking for individual words or numbers separated by spaces.

Forth would take each new word and look it up in the dictionary for a match.  The match was based on the first three characters of the word, and it's length. This is quick to do and suitable for most purposes.

Numerical input, which includes integers up to 65535 were generally not stored in the dictionary, and would be converted from ASCII to integer and then placed on the stack.

The text editor and assembler can exist as one program - as they share a lot of features. Principally we will need the means to parse through the entered text looking for keywords and numbers.  As there are only 20 or so keywords required to implement the instruction set of the virtual machine, the task of programming these into the assembler is not too difficult.

I have chosen mnemonics that simplify this task - the first 2 characters are unique to each mnemonic, and no mnemonic is more than 4 characters long. We can combine the first two characters to produce a unique number, and then use this number in a series of Switch-Case statements to perform the action needed.

The 16 bit virtual machine can handle integer numbers up to 65535.  We need a means of detecting a number within the entered text string, and converting the ASCII characters to an integer. In a similar way to how we uniquely defined the mnemonics by the first 2 characters, we can do a quick test on the string to see if it is a number.

The assembler will convert our inputed mnemonics on a one by one basis into the machine instructions of the virtual machine.  More detail on how the assembler should operate is outlined in Chapter 6 of "TECS".

Sometimes it is easier working in binary or hexadecimal, so additional assembler directives, for example BIN, HEX and DEC, could be used to instruct the assembler which base to use to interpret the numerical strings.

Assemblers make use of other directives such as ORG, and labels, to refer to points in the listing. Assemblers can be single pass or two-pass.   A single pass assembler will require you to keep track of your own labels, which can be quite difficult if the assembly listing is rich in subroutines. So this is possibly one reason why Forth evolved as a language, it has it's roots in assembly language programming,  but the Forth system of "words", provided an efficient and automated means of keeping track of labels and subroutines - it had to, as Forth is composed almost entirely of subroutines.

Charles Moore, created Forth to be an efficient and automated way of creating a common programming environment on each of the wide variety of mainframe machines that he encountered  during the 1960s-70s.  His virtual machine had to be first coded in the native machine language of the processor,  but with the availability of C compilers, the VM can be coded in C.

Moore realised that the tasks of text editor, assembler, compiler and interactive interpreter could be bundled up into one efficient package, which he called Forth.  How exactly this was done was at first fairly obscure, and Moore initially kept much of it to himself, before branching out around 1970 and sharing the inner secrets of Forth within the Radio Astronomy community.

If the primitive assembler can only accept valid mnemonics and numbers, then any other unrecognised text string could be considered to be a new word. A word that cannot be found in the dictionary, is treated as a new definition and is composed from the primitive instructions.

This is similar to the use of a macro label within a (two pass) assembler listing. Once a macro has been encountered and given an address, it can be used again.

So the combination of a simple text editor and line assembler will help us to build up the various Forth word definitions from the primitives. Whilst this C program is not a complete Forth system, it is a tool that helps us create the Forth system.





Saturday, September 05, 2015

Building from the Ground Up




In my previous post which turned out to be a rant about the ever increasing complexity of software, I decided to dig out a draft post from earlier this year which describes my continuing investigation into how the lower levels of computing languages are implemented.

The purpose is to create a set of low level tools which allow application code to be developed easily and interactively on a microprocessor or soft CPU with limited memory resources.

The microprocessor may be an ARM or a softcore running on a FPGA - and with no C compiler available how do you start to bootstrap the processor from first principles?

Booting from Scratch

In bygone generations of computer hardware, there would be a set of toggle switches which could be set to represent the binary instruction words. These words would be individually deposited into memory and the program counter advanced. A very short routine, consisting of only a couple of dozen instructions would be manually toggled into memory, allowing a paper tape to be loaded. Great ingenuity was applied to the hardware architecture so that these loader routines were short.

Daniel Bailey has devised a fun project - his C88 computer is a home-brew CPU on a FPGA which takes one back to the very early methods of booting a cpu from scratch. This blogpost and video explains it nicely. Daniel will be presenting his C88 at the forthcoming OSHcamp- at the end of September in Hebden Bridge.

As technoogy progressed, and IC memories became available, interchangeable eproms were used to allow the program code to be quickly modified and re-run.  However there might be a 20 minute delay between detecting a bug and reprogramming another eprom with the corrected code. To speed up the time needed to program and erase eproms - often a nonvolatile RAM,was used to emulate the eprom and hold the program code. I spent a summer in my late teens programming a Z80 dev board in this manner.

These days almost all microcontrollers have on chip flash memory, and some method for in circuit serial programming. (ICSP).  As a PC or laptop is invariably used  for program development and code compilation, virtually all of the development tools are hosted on the PC, and very few systems have the means to edit and recompile code on the target system itself.

There has to be another way

There is however another means of working, where a minimal interactive working environment can be loaded onto the target system, and this allows the development and debugging of programs on the target system itself. Very little additional support is needed other than a serial interface. The code can be developed in a text editor running on the microcontroller. This is no different to the way that interactive languages, such as BASIC were developed on home computers in the early to mid 1980s.

Interactive development is a completely different method of working compared to the usual edit, compile, load, test approach that has arisen out of the almost universal use of compiled languages - in particular C and its derivatives. It was only when PCs gained more RAM, disks and cpu speed that C compilation became practical.

Forth was one of the first interactive languages, - developed in the 1960s by Charles H. Moore, and it is Forth that influences my investigations here.

Forth might be compared to assembly language, as it deals with low level operations involving snippets of machine language that are threaded together to create some program function. However it is much more than just an assembly language, as it provides the means to edit, compile and assemble source code, in an extensible manner, using a very compact language.

Forth does away with much of the clutter of other languages, and gives the programmer direct access to the resources of the processor.  Forth offers high speed execution, especially on FPGA hardware designed to specifically match the primitive instruction set of the Forth language. Modern, low cost FPGAs can run a Forth engine at a clock frequency of some 50 to 200MHz. At this speed the interactive nature of the language becomes extremely fast, and the time taken to go around the edit-compile-exexute process is slashed to just edit-execute.  This is a significant saving in time, and ideas can be tested and rejected in the time otherwise spent compiling a large application.

Forth uses "words" to perform program functions.  A Forth word ultimately causes a series of machine language subroutines to be executed. In some respect - a Forth word may be likened to a LEGO block, which can be used with other blocks to make a more complicated structure. These structures may be further combined to build up the entire object - in our analogy - the program application.

Forth words are threaded together, like beads on a string, with the end of execution of one word, rapidly jumping to the beginning of the next.

Brad Rodriguez has an excellent article "Moving Forth" which explains how this threading process is achieved. Although written over 20 years ago - the fundamentals have not changed.

SIMPL

In an attempt to understand the fundamentals of Forth, I studied another threaded, interpreted language, SIMPL - the Serial Interpreted Minimal Programming Language.  I have blogged several times about SIMPL since May 2013 - when I first came across it.  Refer back to previous posts for details.

SIMPL takes some aspects of Forth, and packages them in an easy to understand manner which may be adapted to run on a variety of microprocessor platforms.  I first used SIMPL on a ATmega328 Arduino, but have subsequently ported it to ARM and FPGA soft core processor targets.  SIMPL is written in more or less standard C, which makes it portable between, processors, without the complexity of dealing with the machine language of each target processor. It is a stepping stone towards a real Forth.

SIMPL provides an easy way of sequencing blocks of code functions, and doing it in an interactive manner from the serial keyboard interface.  It's not so much a programming language - more a way of automating the access to routines - and done in an interactive manner.

As an example:

Suppose you have just written a function in C to plot a graphical character on the screen, and you wish see what it looks like - using a VGA graphics adaptor or a TFT LCD screen. In order to test this function, you need to supply it with parameters for the x,y position that you wish it to be plotted, and possibly the foreground and background colours. If you hardcoded these parameters, and got them wrong, then you would have to go back and edit and recompile until you got the effect you were looking for.

SIMPL provides the mechanism to enter these parameters and execute the function interactively. If you don't like the first result, you can change the parameters with a few key-strokes and test again. You have virtually eliminated the edit-recompile cycle from when you are developing new code functions.

The Mechanics of SIMPL

SIMPL exists as three short functions that provide a complete serial interpreter or shell, to accompany the rest of your program. By calling the SIMPL interpreter each time the main loop is executed provides the means to interact with the rest of the program.

The SIMPL interpreter consists of the following 3 functions executed sequentially within a loop.

txtRead() takes characters from the serial input and loads them into a buffer beginning at address buf - which has been allocated a length of 64 characters

txtChk()  looks at the first character to see if it is a colon :  If so it copies the input text string to a specific address - to form a "colon definition", an idea borrowed from Forth

txtEval(addr)  This is the interpreter - It is pointed to the start address of the text string and evaluates the string a character at a time. The SIMPL interpreter has a few rules on how it treats each character

If it finds numerical characters, it enumerates them to a 16bit integer that it places on the stack.

If the characters are small letters or punctuation characters it will execute a given function each time that character is encountered. For example, you may have allocated the character "g" to plot out your new graphic symbol, each time g is encountered.

Uppercase characters are used as addressing pointers - to point to additional character strings, either in Flash or RAM.  The colon definition allows you to compose a new sequence of characters, and access it every time the uppercase character is typed.

Because the text is always loaded into a buffer - it is executed at full speed - and not influenced by the serial baudrate.

In short, SIMPL provides an easy interactive means to call your code subroutines (functions) in a sequence from the keyboard, and provide serial printed output back to the serial terminal.

SIMPL is essentially providing a subroutine address lookup, and call to that subroutine, encoded into the ASCII value of a character.  Sequences of characters provide sequences of function calls. Numerical parameters may be passed to those subroutines using an elementary stack based method.

Extending SIMPL

SIMPL is not a fully blown language, but a method of automating the calling of functions according to a stored sequence of characters or tokens.  It treats every ASCII character as a new instruction, so you cannot use multiple character words such as ADD to convey a meaning - as this would be interpreted as sequence A followed by sequence D followed by sequence D and so on.  This is where it deviates from Forth.

It's control structures are limited to simple loops - controlled by a single down counting loop counter.

SIMPL is the stringing together of subroutines in Flash, that have been created from compiled C code. A 16 bit ADD in SIMPL is going to be somewhat slower than the same operation in a directly executed subroutine.

SIMPL does provide some pointers to how a more Forth like version could be implemented.

It already has the means to interpret an integer number and place it on a stack.

It also has the means to take a sequence of tokens - in this case ASCII characters, look them up in a jump table and retrieve the start address of a code subroutine, which it then executes, before returning to the interpreter to decode the next token. Each 7-bit token provides an index into a jump table of addresses, whom where the next word is executed.

To extend SIMPL, it will be necessary to create some additional text handling code, so that typed words can be processed, added to a dictionary and have a codefield associated with them.  How exactly this will be achieved will be the subject of a future post.









Simple Machines

This is a follow on from yesterday's post, where I described some of the cpu technology I currently deal with and how it would be great if there was a common language that would run on them all.

I am currently working with STM32Fxxx Cortex M3  M4 and M7 ARM devices, having designed half a dozen pcbs based around them in the last couple of years. I have also worked with ATmega parts - mostly derivatives of Arduino and now I am branching out into FPGA softcore CPUs.

I am not a natural or native software engineer, and I feel the need to simplify the processes to the point where I understand what is going on and I am in total control of the device. This low level approach rather constrains the scope of what I can achieve, but it gives me the satisfaction of understanding the whole picture.  This approach has led me into the territory of baremetal programming - something that I am slowly starting to get a taste for.

To make life easier, we surround ourselves with familiar useful objects and tools. As we become accustomed to these new things, learn how to use them efficiently, they improve the quality of our output and enrich our (working) lives. It's been the same for the last 50 years, since when computers first started to appear in quantity.

But more computers doesn't always mean better, I have about 7 within arms reach, and one in my pocket. Fortunately 6 are switched off, and the one in my pocket has not disturbed me for at least 30 minutes. There's the laptop I am writing this on, that I bought about 5 years ago and takes 5 minutes to boot, and the Android in my pocket which I use for making calls and sending texts. Most of my interactions with modern operating systems is solely as a user of pre-packaged applications, the inner workings I know nothing about, and probably don't have the time to care.

I must confess I don't know how to drive a Linux machine through it's command line, I only powered a Raspberry Pi for the first time in April this year and I am totally lost on a MAC. To make things worse, I have no idea why my unbranded Android keeps popping up text box warnings in Chinese!  

So - as a dinosaur of the digital age, I will stick to what I know and remain in my comfort zone - which is as close to the bare metal as possible.

Most microcontrollers are programmed in C or one of it's close cousins, with the language tools normally residing on a much more powerful machine, such as a laptop - with whatever flavour of operating system suits the user.  As confessed above, I have grown up with Windows, as have most of the software tools I use in my day to day work and play. It's taken me several years to master one pcb design package, so why should I struggle like an infant, trying to learn another - just because it's the flavour of the month.  My "muscle memories" are tuned to driving EagleCAD, so when out of interest I tried KiCAD, I found it one of the most frustrating, unproductive 2 hours of my adult life. I am beginning to think what they say about old dogs and new tricks, is not only true, but just a polite way of saying "move over, Grandad, you had your chance".

So this is increasingly why, tech people of my generation, cling to the foundations - and avoid climbing the towers of Babylon that have been built, every increasingly skyward, on top of them. 

This is probably not what I intended to write this time - but I feel much better for getting it off my chest.


Friday, September 04, 2015

If you've got a problem .....

If you've got a problem, don't care what it is
If you need a hand, I can assure you this

I can help, I've got two strong arms, I can help
It would sure do me good to do you good
Let me help

Few people today are aware of Problem Oriented Languages - a term first coined by Charles H. Moore - just over 45 years ago in June 1970.

Here is his unpublished book retyped some years ago into HTML, which again today I came across, this time on the Raspberry Alpha Omega blog of Frank Carver.

Frank is clearly an experienced and talented software engineer, and understands the inner workings of computer software in ways that I just struggle to grasp. So it was a pleasant surprise to find that Frank and I have been following virtually parallel journeys for the last couple of years, he from a software perspective - and myself from a hardware perspective.

This post is not specifically about Problem Oriented Languages, but more how they are still relevant today and can be applied to a range of fields of computing science.

A Problem Aired....

My immediate problem is that I am now working with a range of microcontrollers,  both AVRs and ARMs of several different flavours and I need to find some common ground between them, and establish a comfortable development environment - founded on mutual territory.  Throw into the mix, some stack-based soft cores running on FPGAs, and the chance of finding my Nirvana rapidly fades into dust.

All of these processors fit into the 20MHz to 200MHz clock speed, and really don't have enough memory resources to support an operating system like Linux.  Some of the larger parts can be programmed in MicroPython or JavaScript but that's not much use for the soft cores or the smaller parts.

I wondered for a while whether Arduino might become the lingua franca, at least for "hobby" projects - because as well as the ATmega,  it has been implemented on the STM32F  ARM M3 and M4  parts and also the ZPUino soft core - hosted on the Spartan 6 of the Gadget Factory Papilio Duo.

So for the moment, you have a range of different microcontrollers, with which you can share Arduino sketches and libraries.

A word about FPGAs and Soft Core CPUs

In the last few years, FPGAs have become commonplace, and the toolchains needed to develop applications on them, have become free to use. Several tech suppliers are offering low cost dev-boards based around a small FPGA, with sufficient hardware support, to allow programming via serial or JTAG, plus support chips - such as external SRAM, SDRAM and configuration flash.

So far my journey has taken me to the Xilinx Spartan 6 - as there are several hobby-boards based around this device.  Plus this part appears to have a lot of support from the emerging FPGA community.

Jack Gasset and his team at Gadget Factory have democratised the FPGA, in a similar way to how the Arduino Team democratised the ATmega and C++.

By porting an open source soft core 32bit CPU, plus a library of other cool hardware resource libraries - such as sound chips and music synthesisers, you can now make your own hardware designs relatively easily and control it all with the ZPUino softcore - executing Arduino sketches at 100MHz.

The great thing about the ZPUino FPGA soft core for projects, is that there is a hardware VGA generator plus the Adafruit graphics library available which can support 800x600 resolution VGA. You just plug in a VGA monitor, and you can draw graphics to your heart's content - all in a few lines of Arduino code.  This is a super-quick Arduino, with colourful grahic output that you are totally in control of.  Other libraries allow you to add a PS2 keyboard and mouse - you have the makings of a complete development environment on an FPGA.  It's the computing equivalent of an early 1990s workstation - but unconstrained by any operating system - effectively a blank canvas onto which you paint your computing dreams!

If you want to look further - this was built using a Papilio Duo FPGA board fitted with a Computing Shield.  If you want to go this route, make sure you buy the Papilio Duo with the 2MB SRAM - as the cheaper 512KB part will severely limit your graphics play.

So far, more interesting options - and no one solution. And also, the allure of FPGA soft core processors was becoming a great distraction.

J1 - A Forth CPU

Almost exactly 3 years ago to the week, I was in New York for the Open Hardware Summit.  One speaker was James Bowman, who introduced us to GameDuino - a bolt on video system for an Arduino. In th form of a shield, with VGA and audio connectors - it could handle the whole video graphics - based on a neat Forth processor, that James had designed - and known as the J1.  James had developed the J1 for a robotic video system based on a Xilinx Spartan 3E FPGA, which has more than enough grunt to handle the sprites, backgrounds and tone generators of a typical 1980s arcade video game.

The original J1, presented at EuroForth 2010 is descrbed here.http://www.excamera.com/files/j1.pdf
and further links to the J1 and GameDuino are here.

James has since ported it to the Spartan 6, made a few demon tweaks, and it can now run on the Papilio boards.  The J1 can be instantiated as either a 16 bit or a 32 bit processor.  Clock speeds approachin 100MHz are achievable.

In addition to developing the J1 hardware, James has created a version of Forth to run on it known as SwapForth, and has leveraged some open source FPGA tools to create an open core processor, running open source software - developed with open source tools. This video gives a quick demo.

I am still very much digesting the last 3 years on posts on Frank's blogsite.   The next post here will try to make sense of it all.