I am currently exploring the use of soft core processors implemented in FPGAs, with a view to developing an image capture, process and display unit for a friend's scanning electron microscope.
Earlier this year I bought a Papilio Duo FPGA board and a computing shield from The Gadget Factory. Some early experimentation with the ZPUino soft processor - programmed using the Arduino language, gave a taste for what could be achieved using soft core cpus and accompanying VGA hardware defined within the FPGA.
Very recently, James Bowman has released his J1b Forth processor implemented on a Papilio Duo and computing shield. It is this processor core that ultimately I want to use.
However, there is quite a steep learning curve, not only to learn FPGA programming in Verilog and VHDL, the J1 instruction set and the Forth language - to the point where I can program my intended application.
This is quite an ambitious journey for me - an opportunity to stretch my skill set and programming abilities. As with any long journey, it starts with the smallest steps.
This week I am looking at how the J1 instruction set works with the help of a simulator and an assembler, both written in C and ported onto either and Arduino or STM32F407.
With these simple tools, I hope to learn sufficient about how the J1 executes it's native machine language, to the point where I can implement a small Forth-like language.
Inspired by Frank Carver's blog Raspberry Alpha Omega, I decided to dig up some of the work I did earlier in the year - to create a tiny Forth-like language that would run on virtually any processor.
My ambition is to have a language nucleus that resides in about 2K of memory which provides a means to debug and bootstrap an application with limited tools or resources. I imagine it as a common core, which can be accessed via a serial UART, and can be used right from the start of a project and form a foundation onto which an extendable application can be built.
Whilst the usual image of code development is typing into a text editor or IDE and then compiling before flashing the machine code into the microcontroller, my plan is to take a huge chapter out of Charles Moore's book and make my language interpreted and have the means to compile and edit code right there on the microcontroller itself. Indeed very Forthlike.
However over the years and through the various ANSI standardisation processes, Forth has become large and bloated - and that was never what Chuck Moore intended or wanted. So I am going to pick and choose from the characteristics of Forth, and come up with something very much simpler.
The plan is to have a compact language kernel which resides on the microcontroller - regardless of whether it is an AVR or ARM or a specialist stack processor burned into an FPGA. In each case, it will present me the same user interface and experience - for low level hacking or code development.
From a hardware developer's perspective, every microcontroller I work on needs to have the means to print to a terminal and waggle a port pin - right from the get-go.
However, this language need not just be for human interaction. As the commands are very compact, they lend themselves to being packetised, and sent from machine to machine by whatever appropriate communications channel - be it wireless, BLE, TCP/IP or 140 characters at a time via SMS or Twitter. It also allows a microcontroller such as that on the Raspberry Pi, to communicate with other task specific hardware - solely using a UART connection - the speed of control and interaction is not restricted to the speed of a few characters a second that a human can type.
Creating a Virtual Machine
For all this to work, we need to establish a virtual machine on the chosen microcontroller. The virtual machine could initially be coded in C, to run on the target, but later it can be created as a specialist soft core processor on a FPGA. On the Arduino, the virtual machine codes into about 2Kbytes - or 3K when you add the Serial.begin() function for UART output.
Once the virtual machine has been installed it will happily execute it's way through the memory on it's own, until it crashes or is reset. The challenge now becomes writing the low level inner interpreter application code in the assembly language of the virtual machine. This step is something that I will put off until I have generated a means of creating and assembling the language.
To create an assembler, I am going to use the tiny Txtzyme interpreter, written by Ward Cunningham, which was the original inspiration for this project. It allows very basic parsing of a text buffer and then performs one of a series of function calls depending on the character typed, or read from the buffer. Numerical characters are converted into an integer and placed in a variable x.
For a simple implementation of an assembler using txtzyme , let's consider that the instruction word consists if 4 fields
Class Class Field
Jump if zero 0x2000
For the first four of these - the assembler just needs to OR the class field with the literal number or the target address. It may be worth ensuring that the literal is constrained to 15 bits and the target address is constrained to 13 bits.
We will use the following characters to define the instruction class
j unconditional jump
z conditional jump when T=0
For the ALU instruction there are 4 more sub-fields to populate depending on the nature of the instruction.
1. ALU op-code
2. Transfer field
3. Pointer field
4. Return field
The 2nd nibble of the instruction word controls the ALU. It's 16 instructions are decoded thus:
0 t NOP
1 n COPY (T=N)
2 + ADD
3 & AND
4 | OR
5 ^ XOR
6 ~ INV
7 = T = !(T == N) Sets T to status of EQ flag
8 > T= !(N < T) Sets T to status of GT flag
9 / RShift
A d DEC (T= T-1)
B r r-fetch
C @ fetch
D * LShift
E d depth (shows dsp +1)
F u U<
So we have about 20 instructions - that fall into one of 5 categories
LIT - load the included 15 bit literal onto the top of the stack
CALL - call the subroutine at the enclosed 13 bit address
JMP - non-conditional jump to the enclosed 13 bit address
JPZ - conditional jump - only if the top=0
ALU - ALU and stack operations
Literal Instructions take the form 8xxx (in hex)
Jumps 0xxx or 1xxx
JPZ 2xxx or 3xxx
Calls 4xxx or 5xxx
ALU 6xxx or 7xxx if you include the "return"
So the plan is to adapt the txtzyme interpreter to convert text input into machine language in the form of the 16 bit instruction words.
The 3rd nibble of the instruction word insn controls the data flow from the stack to memory
N Insn Top transfers to Next (2nd)
R Insn Top transfers to Return
@ Insn Next transfers to address pointed by Top
_ Insn Not used
The lower nibble of the instruction word is used to control the incrementing or decrementing of the data and return stack pointers dsp and rsp. Pushes to the stack involve incrementing the dsp, whilst popping from the stack means that the dsp needs to be decremented. Some actions are stack neutral, and involve no net gain or loss in stack items.
The parentheses can be conveniently used to represent push and pop operations - memorable that you start with a push (left bracket) and end with a pop (right bracket)
( push ds
) pop ds
[ push rs
] pop rs
The basic core of the assembler which accept the text input and generates instructions as 16-bit hex words fits into under 200 lines of C.
Assembler Instruction set Summary
Implemented so far:
r Right Shift
l left shift
d T - 1 (Decrement)
N T-> N
R T -> R
A T-> A
( Push Data Stack
) Pop Data Stack
[ Push Return Stack
] Pop Return Stack