Sunday, November 22, 2015

A New Compact Microcontroller Board

The new board shown fitted with a 40 pin DIL Package  - eg ATmega1284

A Low Cost Generic Microcontroller and FPGA Board

The Arduino board format is now looking dated with its bulky footprint and only 20 useful I/O lines.

I realised that there was an opportunity to redesign the board and make it more useful for prototyping or developing with larger pin-count microcontrollers - yet retain nominal compatibility with the Arduino connector format, and therefore will also accept most original Arduino shields.

The proposed new board footprint is just 70% of the original area yet provides up to 58 I/O pins, direct USB programming and on board wireless communications.

The pcb makes use of a standard 50mm x 50 mm board footprint - which are now manufactured very cheaply (as little as $14 for 10) by various low cost board houses. 

The board format may also be used as a basis of a 50mm x 50 mm expansion shield.

Pin Naming.

Arduino started life  with 6 Analogue inputs and 14 Digital I/O pins. Over the years these have often been labelled A for analogue and D for digital.

The naming convention I have settled upon keeps the A and D headers for backwards compatibility, but adds extra headers  - labelled B, C, E and F.  Alphabetical port names make sense

These additional 0.1" pitch headers are placed in-board of the existing headers - which give an inner row of headers on a 1.70" width, which makes these entirely compatible with most breadboards and 50mm x70mm 0.1" prototyping boards.

Header A is 6 pins - Arduino standard  - providing analogue inputs
Header B is 6 pins - and provides additional lines with higher resolution analogue capability.
Header C is 8 pins - providing a mix of analogue, digital, communication and timer functions.
Header D is digital and has been extended to include the extra two I2C pins
Header E is 16 pins  - for Expansion - and is exclusively digital GPIO
Header F is also for Future and may provide up to 5 GPIO lines

The layout of the headers has been chosen so as not to be entirely symmetrical - this hopefully prevents any shield from being plugged in back to front.

Making it Compatible with Arduino Shields.

A brief word about Arduino.  Arduino originally offered 6 analogue input pins and 14 digital pins. Unfortunately due to a CAD error, the digital pins are not on a standard consecutive 0.1" spacing - as there is a gap of 0.060" between D7 and D8.

The first task was to come up with a shield footprint that could be compatible with this layout - yet  fit into the narrower width of a 50mm square pcb.  This was done by careful customisation of the size and shape of the header pads - so that they will just fit into a 50mm width.

The second task was to provide some additional 16 pin header strips, inboard of the original Arduino headers, which would give access to an additional 32 GPIO lines.

This was done in a way that would also allow two M3 fixing holes in opposite corners.  Finally, 4 additional signals - not present on the original Arduino headers were added, to give an I2C and the 2 extra pins on the R3 power header.

The 50x50 pcb fitted with 100 pin LQFP and mini-USB connector 

Choice of Processor.

The 50 x 50 board layout could be used for any microcontroller that offers around 50 to 60  I/O lines and can be readily adapted to suit various packages - up to 100 pin LQFP (Like the STM32F746).  For most projects it is a good match with 48 pin or 64 pin LQFP packages.

It may also be used with DIL footprint ICs - and it is just possible to shoehorn a 40 PIN DIL onto the pcb - such as ATmega1284 etc.

However because my recent experience lies with the STM32Fxxx range of ARM Cortex M3 and M4 microcontrollers, these were the obvious first choice.

Conveniently a board designed for one particular variant, can also be populated with another close family member  - so I chose the STM32F103 workhorse, and the STM32F373 - which has a faster M4 core , a floating point unit and significantly more analogue ADC capability - in terms of ADC resolution and signal lines.

Each of these processors has a maximum of 51 or 52 GPIO lines, but once you remove two for the crystal, two for the USB, two for the ST-link and two for the RTC  - you are down to a more manageable 44 lines.

The designation "PA" refers to the physical pins of GPIO Port PA on the STM32 mcu package - and not the A pins on the header. I hope this does not cause undue confusion.

Port PA   12 signals
Port PB   12 Signals
Port PC   14 Signals
Port PD     2 Signals
Port PE     2 Signals
Port PF     2 Signals

Total        44

This is 24 more than the original Arduino, so at least an additional 3 x 8 pin headers will be needed to accommodate these.

The problem is how best map the various GPIO ports on the ARM to the physical pins of the connectors - in a way that makes sense and clusters them by function.  Separating into nominally analogue and digital is a good starting point.

Layout of the Ports.

In addition to the Arduino's A0-A5, the proposed board offers a further 10 analogue inputs  - allowing A0 - A15 and 6 additional analogue or digital lines C0 to C5

These are provided on a 16 pin header on the same side as the existing analogue and power headers.

On the "digital side" of the board there is also and additional 16 pin connector.  This is the Expansion, or Extra port  - and is designated E0 to E15. If you want a 16 bit bus connected  - say for a FPGA project, then this would be good use of  port E.

Furthermore, later Arduino UNO R3 models offer two pins for I2C devices  - these are added as D14 and D15.


The proposed 50 x 50 board size is convenient, compact and versatile.  It has sufficient pins for the more demanding applications, and sufficient board area to allow plug in modules to be added.

The board can sensibly accept microcontrollers or FPGAs up to about 144 pin LQFP - which makes it viable for projects incorporating the STM32F7xx Cortex M7 or the Xilinx Spartan 6 range of FPGAs - both of which are available in LQFP - and this solderable by the hobbyist/enthusiast.

Results of J1 Simulation

First Up  - Some Results

In the last post I looked at running the J1 simulator on various platforms from the humble 16MHz Arduino to the red-hot STM32G746 - running at 216MHz.  Here are the results of those tests in J1 instructions per second - or JIPS as I call them.

Arduino             16MHz    ATmega328                67,000 JIPS
ZPUino              96MHz    Soft CPU                   152,000 JIPS
STM32F103     72MHz   ARM Cortex M3         404,000 JIPS
STM32F407   168MHz   ARM Cortex M4    3,000,000+ JIPS *
STM32F746   216MHz   ARM Cortex M7    9,000,000+ JIPS *

*  The last 2 results are based on level 00 Compiler Optimisation.  With more aggressive optimisation, the '746 was returning 27 million JIPS.

So now that we have a means of simulating the processor at about 1/20th of full speed, the time has come to decide exactly how we are going to port a useful high level language onto this processor model.

James Bowman has done excellent work porting Forth onto his J1 soft core, but I am not quite ready to plunge into Forth - for me it's about the journey of exploration in reaching a high level language implementation - under my own steam.

A small revelation

At this point it is interesting to note - that if the 27 Million JIPS is indeed correct - then the 216MHz  Cortex M7 core is executing about 8 instructions for every emulated J1 instruction - in this particular (non demanding) test program.  So it would probably to be fair to say that most modern ARM processors (M7 and above) would probably achieve a similar decimation ratio whilst simulating the J1.

If this is the case, then a 1GHz ARM could simulate a 100MHz J1 - or put the other way, then a 100MHz J1 would have a similar overall performance to a 1GHz ARM - that was executing some sort of stack based Virtual Machine bytecode language - i.e. Java.

As a lot of applications are written in Java  (eg Arduino IDE), then the overhead of running a virtual stack machine on a register based cpu slows it down by a factor of 10.  If however the Java bytecode were translated into an intermediate form (possibly J1 Forth) it would likely run appreciably faster.

The point I am making is that with access to making one's own customised soft core stack cpu that has been tailored to Java bytecode, running on a FPGAs  could make Java run a lot faster on less powerful hardware machines. Some ARM ICs already have this ability to directly run Java bytecode - known as Jazelle. This is how some games are written, in order to run faster on small platforms - such as mobile phones.

Running the J1 Simulator on ZPUino.

The ZPUino has shown itself to be a  convenient and useful 32 bit processor, implemented on FPGA hardware. As the ZPUino is Arduino code compatible, and runs my simulations about twice the speed of an Arduino, plus the fact that it allows easy use of the Adafruit GFX graphics library, which permits 800 x 600 VGA text and graphics to be displayed on a flat screen monitor.

Whilst not a particularly fast processor, ZPUino does allow easy and unrestricted access to the graphics library - such that it is easy to create a series of animated display screens for displaying high level output, using what is effectively and Arduino sketch. This technique is particularly flexible, and allows you to creatively interact with the particular problem - rather than get bogged down in someone else's system calls and drivers.

I took the very short J1 test program as used in the simulations - a simple loop consisting of 7 instructions, and used the ZPUino to run this test program as an animated simulator - which graphically showed the contents of memory - as a hex dump, plus the main J1 registers, the stack and the instructions as they were stepped through. Repeated re-drawing of the hex dump memory display slowed the execution right down to about 1 instruction per second - about 100 millionth of real J1 execution speed.

The Missing Assembler

What was missing from this exercise was the ability to easily write J1 test programs in J1 machine code - and this rather hampered progress. So it is for this reason that the first application of the SIMPL text interpreter will be at the core of the J1 cross assembler.

Whilst the J1 is intended to run Forth, and has the tools to support it, my Forth skills are not great, and anyway I'm trying to challenge myself to learning C to a reasonable standard.  So a coding project written in C, that taxes my language and thinking skills is a good way to learn, and achieve something useful.

The interpreter can take a set of mnemonics, tailored for the J1 processor and by the process of direct substitution, create the series of 16 bit instructions that can then be run on the J1 virtual machine. I really want this to be an interactive process working in a Forth-like manner - so that small snippets or blocks of J1 assembly language can be assembled and tested individually as an iterative process.

It's many years since I wrote any code in assembler - and that was Z80 which had a reasonable mix of registers to play with.

Writing in a minimal instruction set language, is going to be interesting.

In order to gen up on the processes involved within a typical assembler - I returned to "NAND to Tetris"  Chapter 6.  There is a good description of what is needed there.  I then wnt on to refresh myself on the contents of Chapter 7 -  "Virtual Machine I - Stack Arithmetic" and Chapter 8 "Virtual Machine II - Program Flow".  Having re-read these chapters, in the fresh light of a new day, I believe that my musings about the J1 cpu - are not only very relevant - but completely on track with the content and approach outlined in "NAND to Tetris".

More on this in a later post.

NAND to Tetris - A Personal Journey

From NAND to Tetris (N2T) is the popular name for an open study Computing Science course devised by Shimon Schocken and Noam Nisan.  It is accompanied by a book, by the same authors "The Elements Of Computing Systems - Building a Modern Computer from First Principles" and a series of online and downloadable study materials.

For anyone who wishes to get a more in depth understanding of the interaction between hardware, operating system and software application layers of a modern computer, or consolidate existing knowledge - then I would highly recommended purchasing this book, following the course materials and supporting this project - as a whole.  We need a whole new generation of Computer Science and Electronic Engineers - who understand this stuff from first principles.

After first hearing about the course from contacts at the London Hackspace, I bought the book last year and I am slowly working my way through it.  By this, I mean that I am making my own personal tour of the country that it describes  - and not necessarily by the direct linear route outlined in the book.  I dip into it occasionally, rather like a travel guide, as if I were planning a trip to the next major city.  I believe that I will reach the final destination, but it will be the wealth of experience gained from meandering on the journey, rather than the final destination, that currently is my driving factor.

I embarked on the book, having spent most of career in digital hardware design, but very little real experience of writing software tools. Whilst I found the chapters on hardware were fairly easy to follow, I hoped that the book would lead me gently into picking up new software skills.

The first 5 chapters of the book illustrate and re-enforce the principles of combinatorial and sequential digital logic, by having the student design the logic function of the various "chips" that go up to make a simple cpu.  From basic gates you combine ever increasing complex designs to make up the arithmetic logic unit, ALU, the program counter and the various registers that make up the cpu.

A hardware design language package allows the design, simulation and testing of the various logical components and gives the student confidence that their design meets the test spec of the required item.  It soon becomes apparent that there is no one way to implement the logic of the ALU - but some ways are quicker, more flexible or have a more efficient use of silicon.

I completed the hardware design chapter exercises of the book during an intensive week of evenings in spring last year.  Then got a more than a little bogged down in the software section, as I realised at the time I did not have the programming skills in any language to do justice to the demands of the software exercises - beginning at Chapter 6 "Assembler".

Rather than defeat by a complete road-block,  I have spent the last year surveying the surrounding territory for an alternative route to complete the mission.  In this, I have invested in FPGA hardware, designed pcbs for ARM processors and written simulator code for simple stack based processors.  I have now got to the point where the next logical step is to write an Assembler.

I have picked up enough C skills to put together a simple text interpreter and use it to parse through tables of mnemonics looking for a match and associating a machine instruction with that scanned mnemonic.  It is the basis of a "poor mans" assembler, but it has the flexibility to be applied to whatever novel processors instruction set I wish to explore.  I can now go back to Chapter 6 - with my new knowledge and software tool and make new progress.

In the intervening year - and at this stage in life we view projects in terms of years of involvement,  I have also learned a bit of Verilog and done a bit of FPGA logic design. These are skills I will need  to develop if I am to keep up with the modern world. And whilst I may no longer to be able to see (without glasses) some of the hardware I am working with,  I can still type, and I have the option of increasing the font size. That should keep me viable in the workplace for the next decade or so - although I do increasingly have my "dinosaur days".

This move was partly inspired by the N2T book, and also my desire to get involved in the new wave of low cost FPGAs that have now become available to the hobbyist.  I might be as bold to say that in 2015, they are to the enthusiast what the 6502 was in 1975, and the Arduino was in 2005.  User friendly FPGA hardware is definitely going to be a growth area for the next few years.

FPGAs allow you to design your own custom hardware, or recreate vintage or retro-hardware computers from years ago.  Soft core processors, featuring custom instruction sets are one area of involvement - and these will require software tools to simulate operation and allow code to be written.

In addition, I have moved on from being constrained by just 1 or 2 microcontrollers. I am now experiencing the portability of software written in C, and discovering how easy it can be to switch between processors - even though I have some concerns about the complexity of modern IDEs.

One of the tasks I set this year was to benchmark several microcontrollers with dhrystone and whetstone benchmarks - in an attempt to get a better understanding of how they perform under different applications.

By characterising the relative performance and resources of a few common cpus - I am now able to make informed decisions about which might be more suitable for a particular job. Currently I am impressed with  the ARM Cortex M7,  and I am eagerly awaiting 400MHz versions of this M7 core - expected in late 2016-2017.

Whilst 400MHz might appear puny to those who regularly use twin-core 1GHz parts in their mobile phone or Raspberry Pi, to them I offer the challenge of writing from scratch an Assembler!

Saturday, November 21, 2015

Beating the Bloat


This post is by way of a minor rant about the current state of the tools and methods we use to produce embedded firmware.

In order to perform the benchmark tests on the series of processors yesterday, I had to use 4 individual IDEs and spend 12 hours of my life fighting the flab of blobby bloatware that is the embodiment of the modern IDE.

My grief really started when I wanted to port the J1 simulator to the Cortex M7. For this I needed a "professional"  tool chain.

The Long and Winding Road.......

In order to blink an LED on my STM32F746 breakout board, I had to install the 32K codesize limited version of Keil's uVision 5 and their ARM MDK. This takes about an hour to install and set up.

Then I had to find an example project of something that was close to what I wanted to do - i.e. blink a LED. I found their generic Blinky example - and then found that it had been tailored for a couple of commercial dev boards - and the files that set up the port allocation were locked from editing within the IDE.

So I opened the files in Notepad++, edited the dozen or so lines of code that controlled the GPIO port allocation, and then wrote my edited version in place of the original - so far, so good.

Had I known that at 6pm I was still about 2 hours away from blinking a LED, I would have probably thrown in the towel and gone to the pub.  I eventually tracked down the problem to my particular port pin being re-assigned as an input in the example code, immediately after I had set it up as an output. There was also a minor problem with the clock generation set up for the wrong PLL ratio - that prevented the code from running.

Now I have learnt that ARM processors are fairly complex beasts - and the peripherals take up a a fair time to set them up with the myriad of different options -but when I looked at the project files to blink a LED, I saw that it was taking about 100 code modules to set up the peripherals - and some of those modules were each 1000+ lines of code.

However - as a fairly recent newcomer to the Keil compiler and the ST Microelectronics hardware abstraction layer - who was I to know which of the 100 files I needed and which I didn''t.

This leads me nicely on to  Shotgun, Voodoo and Cargo Cult coding practices. I'll let the interested follow up the definitions, but the point that I am making is that the modern IDE and methods of using a hardware abstraction layer do absolutely nothing to help simplify the problem or reduce the amount of bloat that has to be compiled - regardless of whether it is being used or not.

In order to flash a LED on and off, a single bit in a register needs to change state - why then do I need to compile 10,000 lines of somebody else's code into a 9.5k byte binary, in order to make this happen?

Compilation times of over a minute really do nothing to boost one's productivity. Yet we persist with this madness making our compilation tools even more sophisticated - with the excuse that the processors that we are compiling for are getting more complex - and the commercial suppliers of compilation tools - need to be seen to be keeping ahead of the competition.

It has been ever thus for about the last 50 years or more - with the computing industry pedaling us over bloated, over expensive tools that we neither want nor need.

HAL: Just what do you think you're doing, Dave? 

Well perhaps Dave should be asking HAL just WTF he thinks he is doing.  

And in this case, HAL is the new hardware abstraction layer - cooked up by the teams of clever code monkeys at ST Microelectronics.  I understand that as code gets more complex then it needs to be better managed, and that somewhere out there, someone writing code for a Cortex M0 may have an epiphany moment and realise that he should port his code to a Cortex M7......  

However, it appears that ST Microelectronics has employed a million monkeys with typewriters to undertake the mammoth task of writing the HAL modules - put them in separate rooms (or countries) and made it difficult for them to talk to one another.

Not unsurprisingly, the HAL reference manual runs to 963 pages - and took another team of our simian chums to cook that one up. This link is actually for the STM32F4xx Cortex M4 processors - because it appears that the M7 has not been published yet.

So in reverence to the computer Holly, from Red Dwarf  - I will call this code behemoth HOL  - or the hardware obfuscation layer - as that is exactly what it does.  It makes it difficult to know what your hardware is doing, nor what you need to do, in order to make it work for you.

There has to be a better way - and if Carlsberg wrote compilation tool chains - they would probably be the best in the World.

OK  - time for the pub...........

Friday, November 20, 2015

A J1 Virtual Machine - Gimme some Jips!

BOB is no slouch when it comes to simulating a virtual stack cpu!
Historical Note.

Way back in 1991 when I was half the age I am now,  I did my pcb design work  using OrCAD on a 25MHz 486 desktop. The picture above is of my latest experimental pcb - a breakout board for the 216MHz  STM32F746 ARM Cortex M7 microcontroller.  BOB (above) can emulate a 16 bit minimal instruction set processor  faster than the 25MHz ' 486 box - and for about $20!  Now that's progress.

Implementing a Stack Processor as a Virtual Machine

This post examines the role of a virtual machine, created to run on a given processor for the purpose of simulating another processor, for performing operations that the host processor might not readily do easily. One example was Steve Wozniak's "Sweet 16"  - a 16 bit bytecode interpreter he wrote to run on the 6502, to allow the Apple II to readily perform 16 bit maths and 16 bit memory transfers.

In his closing remarks, Woz wrote:

"And as a final thought, the ultimate modification for those who do not use the 6502 processor would be to implement a version of SWEET16 for some other microprocessor design. The idea of a low level interpretive processor can be fruitfully implemented for a number of purposes, and achieves a limited sort of machine independence for the interpretive execution strings. I found this technique most useful for the implementation of much of the software of the Apple II computer. I leave it to readers to explore further possibilities for SWEET16."

The main limitations to the VM approach is that the execution speed is often one or two orders of magnitude slower than the host running native machine code, but with processsors now available with clock-speeds of 200MHz - this is not so much of a problem.

It is more than offset by the ability to design a processor with an instruction set that is hand-crafted for a particular application, or the means to explore different architectures and instruction sets, and to simulate these in software, before committing to FPGA hardware.

Stack Machines

Whilst Woz's Sweet 16 was a 16 bit register based machine, I had ideas more along the lines of a stack machine, because of it's simpler architecture and low hardware resource requirement.

I had become interested in an interpreted bytecode language that I believed would be a good fit for a stack machine, and so in order to get the ball rolling, I needed a virtual stack machine to try out the language.

Earlier this year, I invested in a Papilio Duo FPGA board, and with this came access to a ZPUino soft-core stack processor - devised and much enhanced from an existing design, by Alvie Lopez. The advantage of the ZPUino was that it was one of the few soft core processors that had GCC available, and so the task of porting the Arduino flavour of C++ to it was not over arduous (for those accustomed to that sort of task - not me!).

However, porting C to a stack machine is never a very successful fit - as C prefers an architecture with lots of registers - such as ARM.

As a result, the ZPUino, whilst clocked at 6 times the speed of the standard Arduino, only achieved about twice the performance when running a Dhrystone Benchmark test - written in C.  The other factor limiting  ZPUino is that it executes code from the external RAM - and there is a time overhead in fetching instructions.

Despite these limitations, the ZPUino has been a useful tool to run simulators, as it supports VGA hardware and the Adafruit Graphics library - allowing text and video output from an Arduino-like environment.

The other stack processor that caught my attention is James Bowman's J1 Forth processor.  This became available as an implementation on the Papilio Duo  in early September to run on readily available FPGA hardware at speeds of up to 180MHz. So I have been working towards trying it out - first using a software simulator.

A J1 Simulator - written in C - and tried on a number of processors.

Back in the spring, I found a bit of C code that allowed a J1 processor to be run as a virtual machine on almost any processor.

Initially, I implemented it on Arduino, but I quickly moved to the faster ZPUino - which, as stated above, is a stack based processor implemented on a FPGA.  This was a stop-gap, whilst I was waiting for James to release his J1 in a form that I could use.

The simulator is about 100 lines of standard C code, and implements a 16-bit processor with integer maths and a 64K word addressing space.

I then wrote a test routine, in J1 assembler, consisting of just  a simple loop - executing 7 instructions and incrementing (by 1) a 16-bit memory location, every time around the loop.

Running this test code - the standard 16MHz Arduino managed  67,000 J1 instructions per second. (Jips).

I then transferred the sketch to the ZPUino, running on the Papilio Duo board.  This provides a useful boost in performance to about 152,000 Jips.

A 72MHz  STM32F103 running the same code under STM32-Duino managed  404,000 Jips - about 6 times the speed of the Arduino,  - a healthy performance boost.

The difference in performance between the 8-bit Arduino and the 32 bit STM32F103 - could be explained to be partly down to the 4.5 times increase in clock speed, and partly that a 32 bit microcontroller can implement a 16 bit virtual machine somewhat more efficiently than an 8-bit device giving an additional 30% boost - over clock speed scaling alone.

In addition, the test code only added one to the memory cell. If this were say adding a 16 bit value into that location - the 16 bit transfer would slow the 8-bit AVR down considerably.

I then proceeded to port the simulator to a 168MHz  STM32F407 Discovery board. The 168MHz STM32F407 returned a slightly puzzling 764,000 Jips.

Based on the increase in clock speed it should have been about  940,000 Jips. This appeared to be a bit slow.  In theory it should be running at 2.33 times the speed of the 72MHz part.  This needs further checking to ensure that it is not a compiler optimisation issue that is holding it back.

I tried again with the various optimisation levels of the  GCC compiler:

Optimisation -00           733,333    Jips
Optimisation -01           3,083,333 Jips
Optimisation -02           3,333,333 Jips
Optimisation -03           3,583,333 Jips

With only modest optimisation the '407 is returning around 3 million Jips!

Meet BOB - the fastest, newest kid on the block.

Back in the summer I made up a break out board BOB for the 216MHz STM32F746  Cortex M7 microcontroller.  Whilst ST Microelectronics had released their $50 F7 Discovery board - complete with LCD, I wanted a very simple board, with the same pin-out as the previous F4 Discovery to try out relative performance checks.

So, it's now time to port the J1 simulator onto the STM32F746 - and see how it performs.

The '746 is an M7 ARM and has a six-stage dual issue pipeline - which means that it can effectively load two instructions from memory at once.  This feature and the higher clock frequency gives it a 2.2 times speed advantage over the '407.

With all this working, the 746 BOB board - should be able to simulate the J1 at around 7.8 million J1 instructions per second  - welcome back to the 1980's in terms of performance!

Whilst we can emulate the J1 in C at around  8 Million Jips, the real J1 should manage nearly 200 Million Jips, so when I get real J1 hardware up and running - it should really fly!


After a long day and half a night of battling with compilers, I just got the figures for the STM32F746 running the J1 interpreter at 216 MHz. Initial measurements suggest that it's running at close to 15 million Jips per second with minor optimisation and about 27 million JIPS with the most aggressive optimisation!

Optimisation 00        9,000,000 JIPS
Optimisation 01        15,000,000 JIPS
Optimisation 03        27,000,000 JIPS

Thursday, November 12, 2015

Minimal Text Interpreter - Part 3

The operation and main routines of a minimal text interpreter  - Part 3

This post is merely a description of the first implementation of the text interpreter looking at the principal routines. It's so I can remember what I did in 6 months time.

Currently only the basics have been implemented - by way of a proof of concept, and running on a 2K RAM Arduino. Later this will be ported to various ARM Cortex parts, the FPGA - softcore ZPUino and ultimately the J1 Forth processor.

There are probably many ways in which this could be implemented - some giving even more codespace and memory efficiency.  As a rookie C programmer, I have stuck to really basic coding methods - that I understand. A more experienced programmer would probably find a neater solution using arrays, pointers and the strings library - but for the moment I have kept it simple.

The interpreter resides in a continuous while(1) loop and consists of the following routines:


Reads the text from the UART into a 128 character buffer using u_getchar.
Checks that the character is printable - i.e. resides between space (32) and tilde ~ (127) in the ascii table and stores it in the buffer.
Keeps accepting text until it hits the buffer limit of 128 characters or breaks out of this if it sees a return or newline  \r or \n character.


This checks if the text starts with a colon, and so is going to be a new colon definition.
sets flag colon=1
calls the build_buffer function


If the leading character is not a colon, this function determines that the word is either within the body of the definition, or it is for immediate execution.  It calls build_buffer,  but only builds the header to allow a word match. It should not add the word to the dictionary, if it gets a match and is already there.


This checks the first 3 characters of the word and puts them into a new header slot in the headers table.
It also calculates the word length by counting the characters as it stores them into the dictionary table, which it continues until it sees a terminating space character.
It increments the dictionary pointer ready for the next word


This compares the 4 characters of the header of the newly input word with all the headers in the header table.
If all 4 characters match then it drops out with a match_address (for the jump address look-up table) and sets a match flag  match= 1.


This is a utility routine which prints out a list of all the headers in the order they are stored in the headers table.


This is a utility routine which prints out a list of all the words in the dictionary in the order they were stored in the dictionary table.


This is the main character interpretation function which implements the SIMPL language core


Not yet implemented.  Returns true if it finds a word and invokes build_buffer and word_match


Not yet implemented.  Converts the ascii text to a signed integer and stores it in a parameter table.
Might possibly use ascii 0x80 (DEL) to signify to the header builder that the following bytes are a number.  Will need a conversion routine to go between printable and internal storage formats.

UART Routines

These provide getchar and putchar support directly to the ATmega328 UART. Saves a huge amount of codespace compared to Serial.print etc


Initialises the ATmega328 UART to the correct baudrate and format.


Waits until the Tx register is empty and then transmits the next character


Waits until a character is present in the UART receive register and returns with it

Printing Routines

Having banished Serial.print - I had to implement some really basic functions


Sends a 16 bit integer to the UART for serial output


Sends a 32 bit integer to the UART for serial output

A Minimal Text Interpreter - Part 2

A Text Interpreter to run on a Resource Limited Microcontroller  - Part 2

In the previous post, I described the basics of a tiny text interpreter, written in C, intended for use on resource limited microcontrollers. The text interpreter would offer a natural language user interface, allowing programming and command line control of various microcontroller projects.

It will also form the basis of a wider range of self-written computing tools, including assembler and compiler, editor and file handler - all of which could be hosted, if necessary on a resource limited target board.

However, for the moment, and for ease of experimentation, the intention was to get the interpreter to run with only 2K of RAM (as per the Arduino Uno).

I envisioned the text interpreter as being a universal resident utility programme (almost akin to a bootloader) that would be initially flashed onto the microcontroller thus allowing a serial command interface and the means to exercise the hardware or develop small interactive programmes.

At work and at home, there are many instances of when I want some degree of interactive control over a small microcontroller project - even if it is just manipulating some port lines or sending and receiving a few serial responses on a terminal programme.

Some Practical Limitations

In order to keep the demands on the interpreter program reasonable it is necessary to put some limits on its capabilities.  In particular, the number of words it can recognise and create jump addresses for. For convenience I used a look up table to hold the jump addresses.  If the look up table is to remain reasonably compact - then a limit of 256 entries seems reasonable.  Restricting the word capacity will also help keep the dictionary and its headers to a manageable size in RAM. This is important when you only have 2K to play with!

As the 4 byte header is in fact a shortform, or compact coding convenience that represents the dictionary, it could be said that in very RAM limited systems that it is not actually a requirement to keep the dictionary in RAM on chip.  The only role that the dictionary performs is to allow the header entries to be expanded to the full word at times of listing.

As small micros generally have plenty Flash available, then the dictionary for all the standard words could be programmed into flash - as indeed could their headers.  If necessary, a shell hosted by a PC application could be used to host the various dictionaries and source code files needed for particular applications. However, the original aim is that this interpreter vastly increases the user-friendliness of the microcontroller target - even with just a serial terminal as the user interface.

Additionally, I have imposed a word length limit to 16 characters.  Imposing this limit means that the word length can be coded as a single hexadecimal digit - which makes it displayable in ascii and human readable. If you can't name something uniquely in 16 characters then you are probably of German extraction.


Different tasks need different tools, and as the interpreter will be used for a variety of tasks, then it seems reasonable that it can be augmented or tailored towards a particular task. This can be done very conveniently with the use of vocabularies - with a particular vocab being used for a particular task.  A vocab that contains the mnemonics of a particular processor's instruction set would be one sensible use when using the interpreter within an assembler, compiler or disassembler.  


Those of you that are familiar with Forth, will say that I am just creating a portable Forth-like environment, but rather than being coded in the native machine language of the target processor, it has been written in C for convenience and portability.

This is indeed partly true, as the utility I am creating has been inspired by the Forth language - especially in its compactness and low resource requirements.  Even in the 1960s Charles Moore was concerned how the tools provided for computing at that time hampered progress, and so set about redefining the whole man-machine interface. He compressed the previously separate editor, compiler, and interpreter programmes (none of which could be co-resident in memory at the same time) into a single compact, convenient package that did the job of all three.

When Forth was first introduced in the late 1960s, mini computers had sub-MHz clock frequencies, and very little RAM, and so benefited greatly from a moderately fast and compact language like Forth. Nowadays, typical microcontrollers have clock frequencies in the 20MHz to 200MHz range and so are not so hindered by what is essentially a virtual machine implementation of a stack processor written in C.

Virtual and Real Stack Machines

I have embarked on this journey because of my wider interest in soft-core and open core processors implemented on FPGAs. Several of these cores are based on stack machines, partly because they may be readily implemented in surprisingly few lines of VHDL or Verilog. Indeed James Bowman's J1 Forth processor is fully described in fewer than 200 lines of verilog.

Whilst a virtual stack machine might not be the easiest fit for a register based processor without performance penalties, it is a wonderful fit for a real stack machine.  A number of open-core processors including the ZPUino and James Bowman's J1 are true stack machines.  Here the instruction set of the virtual machine have a near one to one direct mapping to the the machine instructions of the stack processor.  In this case the text interpreter can be rewritten in the native assembly language of these cpus, to benefit from the vast increase in speed of running without an additional layer of virtual machine.

In order to do this an Assembler will be required  that is tailored to the instruction set of the Forth Processor, and this is one of the first tasks that the text interpreter will be used for - the assembly of machine code for a custom processor.

One of the reasons why I am concerning myself with such low level primitive tools, is the need to understand them from the ground up so that they can be implemented on a variety of non-conventional processors.

Whilst the ZPUino will execute Arduino code directly (albeit very inefficiently  - because of the C to stack machine programming conflicts), the J1 will need the tools to write it's own language from the ground up - and if you already have the mechanisms of a language in place, plus an easily customisable assembler, then it makes the job a lot easier.

In a later post, I will give an update on the text interpreter and it's application to custom code assemblers.

Wednesday, November 11, 2015

The Subtle Art of Substitution - Part 1

A simple text interpreter that allows code to be invoked by natural language words.     Part 1.

Over the weekend and in various bits of spare time I have been developing a tiny text interpreter in C, as part of the larger project of creating some low-overhead tools to run on various microcontroller targets.  The toolset will eventually include assemblers and compilers for some custom soft-core processors - but first I need the means to interpret typed text words and execute blocks of code if the word is recognised.

Why this is useful

This text interpreter is intended to provide a more human friendly interface layer to my SIMPL interactive programming language.  Writing in high level, more meaningful natural language will greatly enhance the speed at which SIMPL code can be generated.

A natural language interface makes programming tasks much easier.  Real words are more memorable than individual ascii characters, and it all makes for more readable code listings. Whilst SIMPL might use lower case "a" to initiate the analog read function, typing "analog" is a lot more reader friendly. An interpreter that follows a few simple parsing rules can offer a much increased speed of programming, yet be modest in the amount of on-chip resources utilised to do this.  The code to implement the interpreter is about a 2K to 3K overhead on top of SIMPL - but that will include listing, editing and file handling utilities too.

Substitution and Assemblers

A text interpreter and its ability to execute blocks of code based on parsing the text commands or file it receives is a fundamental part of utility programmes such as assemblers and compilers. Here a set of keyword mnemonics representing instructions can be interpreted and used to assemble machine code instructions by direct substitution.

With a simple text interpreter we can move out from the realms of numerical machine language, and implement the likes of assemblers, dissassemblers and even compilers.

In the case of an assembler, the wordset will comprise of the mnemonics used by the target processor - and the interpreter will merely substitute the human readable mnemonic for the machine instruction numerical opcode.

For example, a certain processor may have an instruction set including mnemonics such as ADD, AND, SUB, XOR etc. The role of the text interpreter is to find these words within the text buffer or input file and create a table consisting of direct machine code instructions, subroutine call addresses and other variables to be passed via the stack to those subroutines.

At a level above the assembler is the compiler.  This also takes text based input and generates machine code to run on a specific processor.  However, compilers are very complex pieces of software, and it is more likely that I will find an alternative solution  - given long enough.

Why do this?

The purpose of the text interpreter is to provide a natural language text interface for a small, resource limited microcontroller - in a similar style to what was provided with the various BASICs of the late 1970's. It's remarkable to think that some fully functioning basics fitted into 4K ROM and 1K of RAM - solely by some very clever programming tricks - in raw assembly language.

Fortunately most embedded programming these days does not have to resort to raw - assembler, and C has become the preferred interchange language layer for portability.  C code written for an Arduino, may be fairly easily ported to an ARM - provided that low level I/O routines - such a getchar and putchar are available for the target processor.

Coding up a text interpreter is a good exercise in C coding - and as I am not a natural born coder, any meaningful coding exercise is good practice.  I also enjoy the challenge of making something work from scratch block by block - rather than being over reliant on other peoples' code, that I don't even pretend to understand.


As a bare minimum, we assume that the microcontroller target can provide a serial UART interface for communicating with a terminal program. I have recoded Serial.print and it's associated functions to use a much smaller routine - which saves memory space.

Ideally the microcontroller should have at least a couple of kilo-bytes of RAM for holding the dictionary and headers making it possible to implement it on anything from an Arduino upwards.

The text interpreter is an extension of the SIMPL interpreter, and can be used for programming tools such as text editors, assemblers, compilers and disassemblers. It provides the means to input text, analyse it for recognised keywords and build up a dictionary and jump table.

Borrowing from Forth experience, the text interpreter (or scanner) will look for a match on the first 3 characters of the input and the length of the word.  As a word is typed in, it will initiate a search of the dictionary (of already known words). If a match is found, the word will be substituted for a 4 digit  (16 bit) jump address. If the word is not matched, it will be added in full to the dictionary table.

This sounds all very Forth-like, and indeed it is, because it is a proven means to input new text data into a processor's memory using minimum of overheads. The dictionary structure is simple enough that it can easily be parsed for a word-match, and also processed for editing and printing.

As each Forth definition is effectively just a line of text it can easily be  handled with a line-editor - again a simple task for a resource limited processor.

Numbers are handled as literals. A quick scan of the text with "is_a_num" will reveal whether it is numerical text - if so it should be converted to a signed integer and put onto the stack.

The output of the text interpreter should be a series of call addresses relating to the functional blocks of code that perform the routines associated with the keyword.  In the case of the assembler example, the mnemonics can be translated directly using a look-up table which converts them directly into the machine instruction of the target processor - this is especially relevant if the target is a stack machine - such as the J1 forth processor.

Charles Moore struck on the idea of a language that was designed for solving problems.  He envisioned having separate vocabularies for each problem he wanted to solve.  For example his assembler application would use a vocabulary tailored to that application - namely the mnemonics as keywords, similarly the SIMPL language would utilise a vocabulary that supported the SIMPL command set. Thus by pointing to a different vocabulary in flash, the processor can readily swap between contexts.

Hop, Skip and Jump, - the Mechanics of Text Interpretation

Short of providing a flow chart - the description below describes the operation of the text interpreter.

The text interpreter will parse through lines of text, taking each "word" as defined by a group of characters terminated by white-space, and check through a list of dictionary words for a match. If there is a match, then the newly scanned word is either a system keyword or a new one that the user has previously added to the dictionary.

If the word does not generate a match with any existing keywords then it is added to the end of the dictionary - thus allowing a match the next time it is used.

In addition to the dictionary, there is a separate array of records that will be known as the "headers". The headers consists of a shortform record of all of the words in the dictionary.  The purpose of the headers is to allow an efficient search to be performed on the dictionary entries - as words are listed in the headers by their first three characters and their length.  A match on the first 3 characters and the length was proven many years ago to be an effective and efficient means of word recognition- see section 3.2.3 here

Once the header of scanned word has been deemed to match the header of one already in the header table, a jump address pointer can easily be calculated - its actually generated as part of the matching routine.  This jump address pointer is decoded by a look up table to generate an actual 16-bit jump address.

For compactness and efficiency,the word matching routine is limited to a maximum vocabulary of 255 words - which is more than enough for most applications.

The text interpreter deals with lines of code.  At some point their will be an accompanying package that implements a line editor, as the first step towards a full, screen editor.

The input buffer of a terminal program may be some 250 to 300 characters long. This is more than adequate space to define most sequences of command words.  Indeed - it may be beneficial to restrict the input buffer to say 128 characters - as this is what can be displayed sensibly on an 800 x 600 VGA screen.

Word storage format

The shortform entries stored in the dictionary headers can be saved as a group of 4 bytes, consisting of the first 3 characters and the length byte. The routine that searches the headers for the match automatically generates the jump address pointer allowing a lookup to the actual jump address from a table.

        Char1, Char2, Char3, Len
Byte     0         1           2        3

So a word can be expanded by knowing its length and the dictionary pointer to its 1st character
The jump address is shortened to a single byte fast look-up from a table.


It's taken a bit longer than expected, but after an intensive day thinking, re-thinking, then coding, the tiny (2K) text interpreter is now starting to take shape.

I have put an interim version (#45) on this github gist

The interpreter is written in fairly standard C so it can be ported to a number of devices.  If implemented using Arduino using the Serial.print library it uses about 4142 bytes of flash and 1897 of RAM.  By using much more efficient custom UART routines for serial input and output, this can be massively reduced to just 2002 bytes of Flash and 1710 bytes of RAM.

Part 2 of this posting will look further at features of the text interpreter and the SIMPL toolset.

Sunday, November 08, 2015

Open Inverter - Part 7 - In search of Sine Waves

In Search of Sine Waves

A high quality sine wave, synthesised using an Arduino

The ability to generate a high quality sinusoidal waveform at a specific frequency,and amplitude is at the heart of the Open Inverter project.

Most microcontrollers these days have on-chip timers that can synthesise these waveforms  from a look-up table stored in flash memory.

Whilst the early developments of sinusoidal waveform generation were done on Arduino using "Fast PWM"  the pwm control registers could only produce a limited number of PWM clock frequencies:

62.5 kHz
31.25 kHz
7812.5 Hz

My BTN8960 ICs really needed something above the audio threshold frequency of 16kHz but below 25kHz.  My initial experiments were done at 7800Hz - but this produces a painful howl from the transformer windings, and is really not ideal for transformer efficiency.  With the Arduino I had a couple of options left - reduce the crystal frequency to either 4MHz, 8MHz or 12MHz giving me access to 15.625kHz or 23.4375kHz, or write some custom code.  These will be tried at some point when I have the right crystals available.

At the latest open inverter workshop, we reconfigured the Arduino clock to 8MHz allowing us a 15.625 kHz pwm clocked sinewave output. This worked well with the BTN8960 driver ICs.

An ARM Solution

In the meantime, I was keen to press on and get some sinusoid generation code at 25kHz running on at least one of my available STM32Fxxx dev-boards.

At the latest open-inverter hack session held in Snowdonia in early November, I managed to get the STM32F103 sine wave generation code running, and successfully drove the inverter using the BTN8960 half H-bridge ICs.

I must point out at this stage that I have been actively designing pcbs for the STM2Fxxx range of ARM microcontrollers for the last 2 years, and I have several designs that I could adapt to the purpose. The cheapest is the $5 Maple Mini clone from Baite Electronics which uses a STM32F103, but I also have boards with STM32F373,  STM32F407 and STM32F746 at hand.

Of all of these, the STM32F373 is probably the best suited, as it has 3 ADCs with 16 bit resolution, and a lot of useful analogue peripherals - ideal for monitoring a 3-phase inverter.

However, as there is quite a following behind the cheaper STM32F103 boards - so I think this is where I will start.

It is hoped that there will soon be some modular pcbs available  allowing either a dual BTN8960 power board or conventional FET power board to be stacked with  AVR and ARM microcontroller boards.

Watch this space.

Autumn Almanac

Early morning in South Snowdonia 2nd November 2015
Part 1 of a series of posts describing open source activities in the Autumn quarter of 2015.

This last week I have been travelling my "Autumn Tour" which incorporated a trip to the Isle of Man to see family and friends, then on to Liverpool for Oggcamp 2015, and then to Snowdonia, to spend a few days at the Open Energy Monitor premises - for further development of the Open Inverter. Finally home by way of Birmingham's NEC to spend a few hours at the Advanced Engineering Exhibition and trade show.

Unfortunately because of the timing of the Isle of Man ferry service, I missed a big chunk of Oggcamp on the Saturday, but made up for it on Saturday evening having a good chance to talk to a range of people at the excellent Halloween Night social gathering.

Sailing into the sunset-en route to the Isle of Man 28th Oct 2015

On Sunday, I helped out at the "Hardware Hacking" workshop - and engaged some local youngsters with some simple hardware hacks - where we added some extra LEDs to some "blaster guns" bought form the local Poundshop.

After leaving Oggcamp 15, it was a quick buzz down the Wirral's M53 and along the North Wales coast, where I encountered the weather extremes of a beautiful sunset - and also "peasoup" fog  - on what was the warmest November day on record - in mid Wales.

Meeting up with Trystan Lea in Llanberis, we had a chat over a beer about open source design tools, before attending a meeting about an innovative project to restore a former chapel in Caernarfon and convert it into a community workspace.
In the hills above Ysbty Ifan heading for the A5

Returning that evening with Trystan to his family home in south Snowdonia, we had 3 days ahead of open source hardware and software hacking - and time to push forward with the Open Inverter project.

The Bothy - with potential micro-hydro stream in the foreground
The Open Inverter forms just one small part of a small scale renewable energy system.

Some of the other things that are under consideration in the wider project are :

1. A low voltage DC "ring main" with programmable charger outlets for powering portable devices.

2. A battery storage system based on Li Ion electric bike batteries.

3. Simple, practical methods of mounting a 250W pv panel on a variety of properties.

4. A motorised solar tracker-mount for a single 250W panel.

5. A low water volume shower (35 litres), with direct heating as a solar dump load.

6. Integrated energy monitoring of micro-solar systems within the domestic environment

7. A micro-hydro system used to extract dc power from a nearby stream.

In the last month I have experimented with BTN8960 half-H-Bridge ICs and more traditional H-bridges using off the shelf N-FETs and driver ICs.  After my Snowdon trip, we hope to finalise the design of a simple, low cost H-bridge pcb and stackable microcontroller pcb - so that anyone can experiment with the open-inverter design.

This work is ongoing and we hope to have a few prototype pcbs ready for the next Snowdonia meetup - that is planned for the weekend before Christmas.

Looking forwards to 2016, there is a meetup at the British Computer Society premises (just off the Strand)  as part of the Open Source Hardware User Group (OSGUG) "Open Energy Interoperability" hacking workshop planned for Thursday 18th February.  This will be an all day workshop event with lightning talks, workshop sessions and an Oshug meeting in the evening to disseminate the day's proceedings.

Tools for the Job - an Open Source Utopia!

Recently I read Chuck Moore's paper on Problem Oriented Languages and it inspired me to take a fresh minimalist approach to how I view my engineering design tools.

As an electronic engineer, I use on a daily basis, various computer tools to help me design electronic products. These include:

Circuit Simulation
Schematic Capture
PCB Layout
2D/3D Mechanical CAD
3D printer Applications

Most of these packages have their roots in the early 1980's  - if not earlier, and over the intervening decades have grown to meet the available growth in computing power. Unfortunately, this has resulted in significant bloatware, and so really, these tools are not much better than they were 30 years ago.

So this got me thinking to what are the common features of these packages, and could they be rewritten for minimum bloat, so that they can run with acceptable performance on low specification machines?

So after some investigation I found a few open-source EDA (electronic design automation) tools, including KiCAD and gEDA.

At the heart of all these tools is a graphics drawing library, with line, polygon and primitive graphic drawing functions. These objects are represented in memory by their vertice co-ordinates and other parameters - such as colour, layer etc. Objects, once defined in memory may then be assigned to a group - allowing them to be manipulated as a single object - eg a pcb layour footprint of an IC package. Finally there is a routine to manipulate the position of these objects giving the means to move these objects around the screen.

What may appear to be a very sophisticated package, has as it's basis, three simple programs, and these programs could be shared across all the tools I have listed above.

Whilst a pcb layout package is often thought of as a simple 2D design tool,  pcbs are built as a series of stacked layers, and so the package has the ability to create these various layers and manipulate them as a stack.  Similarly a 2D drawing package, can be extended into 3D, by extruding 2D shapes along the z-axis.  What is a 3D printed part, other than a whole series of 2D slices, stacked layer upon layer?

If you examine the way in which ICs are designed, they are also a series of 2D features, built into multiple layers of silicon and metalisation, and created by selective exposure of photo-resist by the use of photographic masks.  In the same way that we design pcbs with layers of copper,  ICs are designed with tools that lay down transistor structures in the silicon and connect them with metal interconnect layers.

This is a large oversimplification, but the point I am making is that the tools have very similar foundations, and as such could be written in a way that purposefully makes use of this common codebase. Once you have established a common graphical object representation and manipulation structure, the user interface and the file interchange format could also be standardised.  In this way, possibly about 75% of the tool is common code, with an application layer built on top. The application layer would customise the tool for a specific task, such as schematic capture, pcb layout or 3D modelling, and as such would export the design data in an industry standard format - such as netlist, Gerber or G-code.

So a complete tool box of design tools could be written with a common open-source core, and standard open source file exchange formats.

The common core could be written to be fast and efficient, even on small resource limited systems such as the Raspberry Pi - so that for minimum outlay, youngsters could have their own open-source CAD workstation.

Harking back, in the very early 1990s, I did schematic capture and pcb layout design on an 25MHz i486 platfom - a Pi-2 should be approximately 100 times faster than that! (based on figures from Roy Longbottom's PC Benchmark Site).

With open source CAD tools running on a low cost platform - this would open up engineering CAD to a much wider population. Including those that arguably need it most - such as developing countries  - who so far have been denied access to modern CAD tools on the grounds of cost - but who's economies would benefit so much from access to modern manufacturing methods such as community 3D printing and low cost pcb manufacture.

Conversely, the graphics core software could be ported to run (as OpenCL) on the ultra-quick GPU of a modern gaming platform - and allow blistering performance when rendering models in 3D.

This post may have described some Utopian outcome, but all the building blocks for good integrated open source CAD tools are in place. It just needs a group of collaborative software developers to make it happen.

Tuesday, October 27, 2015

SIMPL Revisited - Again, but this time it's personal

In May 2013, I learned about Ward Cunningham's Txtzyme language - a very simple serial command interpreter which allowed basic control of microcontroller function using a serial interface.

I wrote about it here:

Txtzyme - A minimal interpretive language and thoughts on simple extensions

Tytzyme was about as easy as a "language" could get, it offered digital input and output, analogue input, rudimentary timing in uS and mS and simple repetitive loops.  It was so simple, and offered so many opportunities for extension, that I decided to write some new functions - and called the extended language SIMPL  - serial interpreted minimal programming language.

In late May 2013, I described the extensions here:

SIMPL - A simple programming language based on Txtzyme

In the last 30 months I have ported SIMPL to Arduino,  ARM and FPGA soft core processors. I have also used the Txtzyme interpreter to help to create assembly language for an entirely new soft core FPGA cpu.

Very often, during initial deveopments, we need a low level human interaction with a new microcontroller, and SIMPL seems to provide that coding framework to allow more complicated applications to be tested and coded.

SIMPL is coded in C - which allows good portability between micros, but recently I have been wondering whether SIMPL would be better coded in Forth, and act as a human interface layer between a Forth virtual machine and the human programmer.

Forth can be a difficult language to master - partly because it involves considerable manipulation of the data stack, and the need to keep track of the stack presents quite a challenge to the new programmer.

Standard Forth consists of a vocabulary of about 180 compound words, which can be entirely synthesised from about 30 primitives.  When ported to a FPGA soft core CPU, optimised for Forth,  most of those 30 primitives are actual native instructions. That makes it fast in execution, but still not the easiest of languages to grasp.

Can we use SIMPL to access a sufficient subset of the Forth vocabulary to allow real programs to be written, but without having to tie our brains in knots over keeping track of the stack?

The beauty of Forth, is that you can compile a new word and give it any name you wish.  In SIMPL terms, this name would be a single alpha or punctuation character. Small alpha characters are reserved for the keywords, whilst capital letters can be used for User defined words.

This gives us access to

26 lower case primitives
26 Upper case User words
33 punctuation characters

This gives us a subset of 85 possible words - which has reduced the scope and complexity of the standard Forth language by a factor of 2.

Forth text entry consists solely of words and whitespace. This is intentional because it makes it more readable from a human point of view, and the spaces between words allows the compiler to know where one word ends and the next begins.  The carriage return is usually used to denote the end of a line, and thus signal the compiler to start compilation of the contents of the input buffer.

SIMPL borrows from these ideas, but attempts to simplify by removing the whitespace. In fact a space character (ascii 32) may have it's own unique meaning to the SIMPL interpreter.

Numbers also present a burden to the traditional Forth interpreter - which has to search the dictionary only to find that a number is not listed, and then assume that it is a number.  In SIMPL we assume that numbers are all decimal unless prefixed with a character that changes the base to hexadecimal.

As the compiler is only dealing with 85 possible input characters, the dictionary is simplified to an address lookup.  For example if the interpreter encounters the character J, it looks up the address associated with the function called J, executes that code and then returns so as to interpret the next character in the buffer.

There are only 85 characters to interpret - so only 85 action routes to execute.

Here are some of the primitives

+           ADD
-            SUB
*            MUL
/             DIV

&           AND
|              OR
^             XOR
~            INV

@           FETCH
!             STORE

<            LT
=            EQ
>            GT

j             JMP
z            JPZ
:             CALL
;             RET


#            LIT
%           MOD
£            DEC
$            HEX

(            Push R    >R
\             R@
)            Pop  R     R>

?            Query
_           Text String

a           ALLOT
b           BEGIN
c           CONSTANT
d           DO
e           ELSE
f           FOR
i          IF
l          LOOP
o         OVER
p         PRINT
r          REPEAT
s          SWAP
t          THEN
u          UNTIL
v          VARIABLE
w         WHILE

Tuesday, October 06, 2015

Open Inverter - Part 6 - Thinking Allowed

130 VA Toroidal Transformer Inverter using BTN8960 H-Bridge ICs

It's been a frenetic week of progress on the Open Inverter project. Both Trystan and I have managed to cobble together inverters from FETs, H-bridge ICs and other random, readily available modules - bought cheaply from China.

I have powered up the inverter with a DC input of 24V close to 5A and successfully powered a couple of 60W 230V lightbulbs.  The efficiency looks promising with neither the H-bridge nor the transformer getting anything more than warm.

In this post, I pause for thought to decide upon the future direction of the project based on our findings, so far in this week of discovery.

Here's the current wish-list:

1. An Open Source Inverter, of modular construction that is scalable in blocks of 125W or 250W.
2. Built from readily available, understandable, low cost electronics.
3. Rugged, robust, reliable - delivering reasonable efficiency and power quality.
4. Grid synchronisable - if required - will synchronise to external source.
5. Built in power monitoring, with wireless communications compatible with emonCMS monitoring
6. "Arduino" or similar microcontroller for hackability.
7. Supports a variety of power conversion topologies, including boost, buck, peak power tracking and split-pi.
8. Uses include micro-solar, LiPo4 battery charging, dc ring main schemes etc.
9. Under $20 for primary building block.
10. Easy to build, easy to repair, extendable, hackable.

Expanding on some of the above points.

The proposed inverter will be built from low cost modules that can be plugged together as required, depending on the application.

The basic design will consist of a microcontroller, one or more H-bridge power boards and a 125VA or 250VA torroidal transformer.

Choice of Microcontroller.

The microcontroller could be an Arduino or derivative, or one of the very low cost, easily breadboarded STM32F103 ARM boards - based on the Maple Mini - which can be programmed with Arduino code - using STM32-Duino.
The breadboardable Baite STM32F103 "Maple  Mini" 
The advantage of using the STM32F103 is that it has more I/O and Flash/RAM than the standard Arduino and runs at about 5 times the speed.  It has more versatile and numerous pwm outputs and higher resolution AD converters. Remarkably these little boards are available very cheaply (<£5) from numerous vendors on ebay.

Using STM32Duino, these boards can be programmed from the Arduino IDE - using an additional board file which caters for the STM32Fxxx range of ARM devices.  This allows sketches developed for Arduino to be readily easily converted so as to run on the much greater performance STM32 range.

Choice of H-Bridge.

This handy power board is one I designed earlier in the year to drive a 100W DC Motor, it uses 2 x BTN8960 ICs

The H-bridge board can either be based on standard n-type FETs with driver ICs, or using the more sophisticated BTN8960 H-bridge ICs.

The FET solution may be more hackable and appplicable to other projects, but the H-bridge based on the BTN8960 is a quick and cheap solution.  I had already designed a power board to drive a low voltage dc motor, and it was very easy to adapt it to drive the secondary of the 120VA toroidal transformer, using an Arduino-like mcu to generate the 50Hz sine waves.

I am currently testing both solutions so that I can give more informed advice based on my findings.

The BTN8960 power board runs very efficiently with a 120W load. However, the IC is limited to 25kHz switching frequency - and this is not an easy frequency to create using the fast pwm options on the standard Arduino - short of running it on an 8MHz or 12MHz crystal.

The tabs of the H- bridge ICs are soldered to large copper areas - approximately 30 square cm each. This allows them to run cool even at 120W power - without additional heatsinking.

Choice of Toroidal Transformer.

This can be any toroidal transformer in the 50VA to 250VA range - with the secondary voltage chosen to be approximately that expected from the solar panel, or the battery.  For convenience I use a 130VA toroid with a 24V secondary.

The toroidal transformer does two things for us, it conveniently steps-up the output voltage from the H-bridge to that of the ac mains, and  effectively isolates the low voltage power electronics from the high voltage mains - so that high voltages are contained in the transformer, and not on the H-bridge board - making for a much safer project.

The toroidal transformer is also fairly efficient at converting the low voltage to mains - depending on its VA rating about 91 - 95% is typical.

In the UK, a suitable Vigortronix  120VA transformer 0-12V 0-12V from Rapid Electronics (88-3814) is £15 or less, on ebay.

Putting it in a box.

As the inverter has mains voltages present, it is recommended that it is put into a plastic or metal enclosure.

The largest, heaviest component is the toroidal transformer, which should be securely mounted to the case.  The 120VA inverter should fit in a case about 100 x 160 x 50mm, some of those extrusions use for Eurocad sized pcbs could be used to advantage.  The Vero 14-1003 or the Hammond cases (Rapid 30-1574 or 30-1535) aluminium extrusion at 105 x 165 x 60 or similar would be ideal.

Using components sourced from the UK, a DIY 120VA inverter in a case could be made for about £50.

A few points on efficiency.

The losses in a toroidal transformer are the sum of the Iron Loss and the Copper Loss. The Iron loss is effectively the magnetising current required to set up the field in the core and lost in the eddy currents. For a 230V 120VA transformer, this magnetising current is about 9mA and the iron loss is 0.98W.  The iron loss remains constant at all loads.

The copper loss is the sum of the I2R losses in both the primary and secondary.

For a 2 x 12V  120VA transformer running with 5A secondary current and 0.5A primary current

Secondary loss = (5 x 5 x 0.24) = 6W

Primary loss  =    (0.5 x 0.5 x 14.6) = 3.65W

The copper losses will rise with temperature because of the increase in the resistance of the windings with temperature.

In total there will be about 11W lost in the transformer when working at its rated power.

Losses in the FETs.

These are outlines in the BTN8960 datasheet.  At 25C, the path loss is 14.2 milliOhms.  With a 5A drain current, the I2R losses in the FETs are (5 x 5 x 0.0142) = 0.355W. There wil also be some switching losses, plus powering of the remainder of the circuit.

I measured the input current to the H-bridge, which also includes the Arduino, a relay and a 24V to 5V simple switcher 5V voltage regulator.  When not driving the transformer the circuit consumed 50mA at 24.25V.  A loss of  1.21W

With no load on the transformer primary, the current into the inverter was 0.23A.  This puts the no load driver losses as (24.25 x 0.23) = 5.58W.

Adding all the losses, the best estimate (unconfirmed) for system losses is

No load losses     5.58   W
FET I2R losses    0.355 W
Iron losses           0.98   W
Copper Losses     9.65  W

Total                    16.565 W

Regarding overall efficiency

Inverter Efficiency   (120-16.565)/120  =  86.2%.

Room for Improvement

The proposed micro-inverter is intended to be used when there is no alternative to using ac mains - for certain low wattage devices.  The second part of this proposal is to use dc for direct charging of consumer electronics and mobile computing devices.

By increasing the system voltage, to say 48V, the currents switched in the FETs is halved, and so these I2R FET losses can be quartered, however a transformer with a 48V secondary will have more winding resistance.

The losses in the toroidal transformer are more or less fixed for a given core size,  however by using a larger core than is actually needed, it will have lower resistance windings in both the primary and secondary - and so the copper losses will be reduced, but the iron losses will be up a little.

For example using a 250VA toroid - but running it at 5A

Secondary loss = (5 x 5 x 0.08) = 2W   (previously 6W)

Primary loss  =    (0.5 x 0.5 x 6.1) = 1.525W  (previousy 3.65W)

Iron Loss      = 1.62W  (previously 0.98W)

No load losses     5.58   W
FET I2R losses    0.355 W
Iron losses           1.62  W
Copper Losses     3.525  W

Total                    11.08 W

Inverter Efficiency   (120-11.08)/120  =  90.76 %.

So a half loaded 250VA transformer will run cooler with about half the total losses of the fully loaded 120VA toroid.  This can increase the overall efficiency of the inverter by about 5%.  It also allows for some extra capacity when other loads are switched in, and the voltage droop, under load will be less.

Open Inverter - Part 5, More Experimentation

Today was a day in the lab to try out the various H-bridge modules I have access to, plus the toroidal transformers, which are more efficient than the E-core types that Trystan and I were using last week.

Low Cost IBT-2 Modules

Some months ago, I  bought a pair of the IBT-2  H-bridge modules from an ebay seller. These are advertised as 43A and 24V. They use the now obsolete Infineon BTS7960 half H-bridge IC, that I discussed in Part 3.

The IBT-2 Module uses the Infineon BTS7960 half-bridge module
These boards are 50mm x 50mm and the holes (3.2mm) are on 45mm centres - this is an ideal size to take advantage of the low cost 5 x 5 cm pcb services being offered.  I paid about £7 each for these - including free shipping from China!  There are several variants of this module around, as it has been widely copied in China by various manufacturers.

It uses two of the Infineon BTS7960 half bridge modules, and a 74HC244D buffer-driver IC to provide some isolation between the "Arduino" and the H-bridge.

Not entirely visible is the anodised aluminium heatsink that is screwed to the underside of this module - to take the heat from the BTS7960 devices.

The BTS7960 is quite easy to drive - it has a single PWM pin an an /INHibit pin - active low. Only 4 port lines are required from the Arduino to drive this module.

At first I was getting some very unusual waveforms from the output tab of the ICs.  I soon found out that they are limited to a 25kHz pwm signal - and I was using 62.5kHz. I made a quick change to the PWM timer control register - to reduce the PWM to 7.8125kHz and all was well - I was getting good clean sinusoidal signals from the IC tabs - with the scope set to 1kHz low pass filter mode

Connecting the Transformer

I happen to work for a company that uses a lot of toroidal transformers of various VA capacities. Today I selected our smallest and cheapest - which is 120VA with a nominal 24V rms secondary and 5A current.

Our standard 120VA toroidal transformer has a nominal 24Vrms secondary winding @ 5A
The transformer is intended for both 115V and 230V operation - so it is a case of connecting the split primary windings in series in order to get 230Vrms output.  If you get them the wrong way, the phases wil cancel out and you will see no output.  If you want 115V, you have to get the primaries the correct way around in parallel - otherwise you will short them out - which is bad   :-(

In initial tests, I found that the "Magnetising Current" for this particular transformer was 0.22A from a 24V dc supply.  So the inverter burns 5.28W when idle - before an ac load is attached.

The BTN8960 H-Bridge

My company makes products that uses brushed dc motor drives - between 100W and 600W.
Earlier this year I designed an experimental board o use the newer Infineon BTN8960 half-H driver IC. This replaces the BTS7960 - which is now end of life - and becoming harder to find.

On the left is an "Arduino" providing the pwm drive signals to the BTN8960 H-bridge board on the right.
The "Arduino" board is a ATmega328p-AU with 16MHz crystal, reset circuit and FTDI connection. Most of the I/O is brought out to connectors for easy plugging - this allows simple generation of 50Hz complimentary pwm sine waveforms using Timer 2.

The board on the right is an experimental motor drive board I cooked up earlier this year to evaluate the BTN8960 devices for dc motor control. The BTN8960 devices, IC1, IC2 are located in the middle of the upper and lower edges of the pcb - with the large dc-link capacitor located between them. A thermistor (thin yellow wires) allows the temperature of the upper BTN8960 to be measured.  This board has no external heatsinking -  but relies heavily on large "flag" copper areas both on the upper and lower surfaces of the pcb to dissipate the heat.

The 24V dc input to the board is on the lower left, and the output to the toroidal transformer is on the right edge of the pcb.  The orange device is a relay which allows the toroid to be disconnected from the transformer.  The board also includes a LM2576  5V "simple switcher"  voltage regulator - for powering the microcontroller.

Here we see the full set-up.

The Driver Board, Toroidal Transformer and switched socket outlet complete the prototype Inverter.

So that was the state of play at 6pm this evening. I had the opportunity to connect my Weller soldering iron station to the output of the transformer. Off load the mains ac output was 240V dropping to 238Vrms when the soldering iron was plugged in. It used 24V dc at 1.15A from the 24V bench supply to power the iron.

More Testing Tomorrow.

Today I was lacking in  suitable 230V loads to try. Tomorrow I have a bunch of 240V 60W incandescent lightbulbs to try out and slowly push this inverter up in power output to characterise its performance and efficiency.