Saturday, May 23, 2015

Using Piano Forte with the Espruino Javascript Interpreter

Piano Forte - with ESP8266 WiFi and HM-11 Bluetooth Low Energy 


In the last few weeks, a few things have come to light that make working with the STM32Fxxx series of ARM microcontrollers a lot more fruitful.

Espruino - JavaScript On-Board

The Espruino project, which provides a JavaScript interpreter for several of the STM32F ARM devices has blossomed and matured, giving a whole host of new powerful functionality and the benefit from the immediacy and interactive nature of an interpreted language.

Also - there is now a large user base for Espruino, and a forum offering help and guidance for the newcomer - as well as code examples.

As the standard variant of Piano Forte uses virtually the same microcontroller as the Espruino board - (they are both STM32F103 with 256K of flash and 48K of RAM), it's just a case of downloading the latest code image from their site and programming it in using the STM32 bootloader utility.


Making Connection

Espruino uses the STM full speed USB com port driver, but if there is no USB plugged in, it defaults to USART1 which appears on PA9 and PA10.  This is useful for testing - as it allows you to get a board up and running with just a FTDI or similar serial cable.  The serial baud rate defaults to 9600 baud - but this is intentional, because it allows a Bluetooth module - such as the HM-11 to be used directly for programming and communicating with the Piano Forte.

The Piano Forte was also designed to be an I/O board for the Raspberry Pi.  The Pi can be used to send the javascript text across to the Piano Forte - again at 9600 baud - using it's UART pins (8 and 10) on the GPIO connector.


The FTDI  is debug cable connected via USART1 - this also has access to the serial bootloader

PA9      Tx
PA10     Rx

If the Piano is to be fitted to a Raspberry Pi - the FTDI cable is redundant and the USART1 is connected to the PI GPIO header via a couple of series resistors on the underside of the pcb.

Wireless Options

Piano Forte was designed from the start to allow various wireless connectivity options. These will be most useful when the board is intended for stand- alone operation - and not as an I/O slave for the Ras-Pi.

An 8 pin header is included on the pcb to allow the ESP-01 to be fitted.

The ESP8266 module is connected via USART2 which appears on PA2 and PA3

PA2  Tx
PA3  Rx

Additionally there is a footprint on the pcb which accepts the HM-11 Bluetooth module.

The HM-11 BLE module is connected via USART3 on ports PB10 and PB11

PB10 Tx
PB11 Rx

On the first set of prototype boards this will need a couple of wire links.

Friday, May 08, 2015

Interfacing a Chord Keypad to Papilio Duo FPGA Board

Five keys plus thumb operated shift/control key
This is a simple chord style keypad that I made up a few years ago.  It is loosely based on the Microwriter - an early UK designed portable keypad / notewriter.

The five main keys are located under the fingertips and thumb of the right hand, plus an additional shift key that can be held down with the thumb.  This combination allows up to 64 key combinations  - which is enough for simple ascii, alphas and numerals.

However, this time the application is not for text entry, but to allow very rapid access to menu items, tools and colour options for a CAD program - without having to break concentration and use the keyboard.

In the early 1990's, Charles Moore - the founder of the Forth programming language, devised a similar simple keypad, to allow him fast, direct and immediate control of his OKAD suite of VLSI CAD design tools. It means that all the frequently used options, for example the toolset in CAD, are immediately below your fingertips and accessible without remembering a series of keyboard strokes or moving the mouse off the working drawing area, to the menu bar, to select a new tool.

If this multi-keypad were combined with the position and scrolling functions currently provide by a mouse, then it would mean that a considerable amount of program interaction could be done from the "mousekey"  without having to take one's hand of the mouse to type at the keyboard.

Wiring Up

The keypad will interface directly into the Atari joystick port of a Classic Computing Shield that I purchased from Gadget Factory - as part of the Papilio Duo development kit. I am slowly developing a stand alone CAD workstation, based around a 32 bit soft processor running on the Papilio Duo FPGA.

The wiring is very simple - just 6 microswitches connected to inputs pins which short to ground when pressed.  The Computing Shield provides 47K pull-up resistors - such that the port lines will all appear high until a keyswitch is pressed. The wiring schematic is provided in the Computing Shield hardware guide



For the moment, the keypad inputs will have to be polled, but a little more logic - such as 6 diodes, would allow a further line to indicate an interrupt from the keypad - if any switch is pressed, the diode AND  (inverted logic OR) would bring the interrupt line low.

Decoding the key inputs in code is fairly simple - we assign binary weights to each of the key inputs, and then the unique combination of switch presses is the binary sum of these weights.

For example - assign a weight of:
1 to switch 1,
2 to switch 2,
4 to switch 3,
8 to switch 4,
16 to switch 5,
32 to switch 6.

If switches 1 and 2 are pressed together, the decoding returns a value of (2+1) = 3.

Some consideration was given to switch debounce - multiple sampling of the key-inputs until a stable, repeatable sum is achieved.

The keypad is first scanned to see if any key is pressed.  If there is, we wait for 30mS and scan again. We then wait a further 30mS and scan a second time.  If the first and second scans are equal - we know we can return with a valid keycode.

Why a Custom Keypad?

The keypad provides a simple and fast means of navigating and selecting a wide range of options from a menu.

It may be used for causing a quick change in operation - such as an original games controller.

It can also be used for text input - but this requires a bit of practice and learning the character set.







Saturday, May 02, 2015

Benchmarking Arduino - and his Chums.

Background

The standard Arduino based on the ATmega328 is an 8-bit device with a 16MHz clock frequency and 2K bytes of RAM.

I have for some time been exploring more powerful alternatives to the Arduino - especially the 32 bit STM32Fxxx ARM Cortex M4 range of microcontrollers, and some softcore processors implemented in a FPGA.

The one thing that these processors have in common is that they can all be programmed using the Arduino IDE - so in theory, code written for the Arduino will run on all devices -almost without modification.

The flavour of C++ used by Arduino has become a kind of lingua franca for these widely varying processors, allowing access to vast knowledge base and range of libraries that permit the easy interfacing of hardware devices. In truth, if you wish to use an integrated device or sensor, then someone will already have created an Arduino library for it.

Since the earliest days of commercial computers, both manufacturers and users have had a strong interest in their computing performance. Computers were expensive, and computing time was equally expensive. Any way of increasing performance and reducing programming costs was sought after. As memory technologies improved, processor cycle times reduced to match the shorter access time of the memory.

When launched in 1965, the PDP-8 was capable of 312,500 12-bit additions per second.  How does the Arduino compare with that figure?

Into Practice

The Arduino is a great platform for trying things out.  Whilst not the fastest board available, it's resources are easily accessible, and the millis() and micros() timer functions allow simple benchmarking to be done. 

Remembering the claimed performance for the PDP-8, I decided to set up a simple addition test for Arduino. First I formed an array of 16 bit integers - remembering that in the Arduino there is only sufficient RAM space for about 500 16-bit words. Exceeding this gives a risk of overwriting some of the stack, heap and system variables.

I then loaded up the array with random integers 

void setup()
{  
  Serial.begin(115200);
  
  for(int i = 0; i <=500; i++)
  {
  m[i] = random(0,65535);
  }  

}

The main routine would then add two of the memory locations together, working it's way through the array. The time taken for the 500 iteration function was calculated using the micros() function.

Results were as follows:

1. Adding a constant to memory    1uS
2. Adding contents of two memory locations into a variable   1.4uS
3. Adding  contents of two memory locations and storing back into a third memory location 1.6uS

So based on this, the Arduino is performing addition of  memory located operands at between 2 to 3 times the speed of the 1965 PDP-8.

However  - we should bear in mind that at 16MHz, the Arduino is executing approximately 16 instructions per microsecond.  Whilst an 8-bit add is a single cycle instruction, by the time we have used it within a 16-bit add routine, and involved a memory access, the Arduino is taking roughly a microsecond to achieve a common operation in a typical program.  

I then conducted the same test on the 72MHz STM32F103 board programmed using Arduino_STM32.

1. Adding a constant to memory    0.156uS    (6.4X faster)
2. Adding contents of two memory locations into a variable  0.294uS  (4.76X faster)
3. Adding  contents of two memory locations and storing back into a third memory location 0.32uS (5X faster).

Next was the turn for the 96MHz ZPUino - a softcore running in a Xilinx Spartan 6 - on a Papilio Duo FPGA board.

1. Adding a constant to memory    0.58uS    (1.72X faster than Arduino)
2. Adding contents of two memory locations into a variable  0.708uS  (1.39X faster)
3. Adding  contents of two memory locations and storing back into a third memory location 0.706uS (1.41X faster).

The skeleton Arduino code for these addition speedtests is available on this Github Gist

The results for the ZPUino were a little disappointing - bearing in mind it is being clocked at 6X the speed of the Arduino. However it is a stack based processor, which uses external RAM. The external bytewide RAM will slow down RAM accesses and perhaps C does not compile efficiently to its stack based architecture.

Whetstone and Dhrystone Benchmarks

One solution is to use standard benchmark code, of which there are several well documented programmes, designed to test the various performance aspects of the processor. These include:

Dhrystone  - an integer arithmetic benchmark
Whetstone  - a floating point benchmark
CoreMark - for multi-core processors
LinPak      - for Linux based systems

Dhrystone - a fixed point benchmark

The Dhrystone benchmark code - suitable for small microcontrollers is available here - however the Dhrystone caused some me difficulties in converting it to an Arduino compatible format - particularly because of the shortage of RAM on the ATmega328.

Fortunately, I was contacted by Magnus of Saanlima Electronics - with an Arduino friendly version. You will however need an Arduino MEGA - because the Uno does not have enough RAM to run this benchmark.

http://www.saanlima.com/download/dhry21a.zip

Results:

Arduino MEGA1260 board 16MHz

Microseconds for one run through Dhrystone: 78.67
Dhrystones per Second:12711.22
VAX MIPS rating = 7.23


72MHz STM32F103 programmed using Arduino_STM32 

Microseconds for one run through Dhrystone: 11.66
Dhrystones per Second: 85762.68
VAX MIPS rating = 48.81
I then ported it to ZPUino2.0  - and after a little fiddling got the following:
Microseconds for one run through Dhrystone: 37.95
Dhrystones per Second: 26351.79
VAX MIPS rating = 15.00


Whetstone - a floating point benchmark

The Whetstone test code adapted for Arduino by Thomas Kirchner is here
When run on a standard Arduino 16MHz Duemillenove the Whetstone produced the following result

Starting Whetstone benchmark...
Loops: 1000 Iterations: 1 Duration: 81740 millisec.
C Converted Double Precision Whetstones: 1.22 MIPS

On the STM32F103 board with a 72MHz clock

Starting Whetstone benchmark...
Loops: 1000 Iterations: 1 Duration: 19691 millisec.
C Converted Double Precision Whetstones: 5.08 MIPS

So the STM32F103 appears to be running at approximately four times the speed of the Arduino. This speed increase is dominated by the faster clock on the STM32F103, and not that the ARM processor is executing the compiled code any more efficiently than the AVR.

Whilst an indication of processor performance, the benchmarks are a somewhat artificial test, and the actual performance of one processor compared to another will depend on the application. Additionally, the manner in which the compiler interprets the C source code and efficiently converts it into the native machine language of the processor has an effect on the overall processing speed.


Conclusions

The 16MHz Arduino can execute real code at around 1.22 million instructions per second.

Moving up to a 72MHz STM32F103 ARM will give about a 5X to 7X speed advantage over the Arduino. A lot of this is down to the faster clock, and some down to the fact that double precision arithmetic will be easier (less cycles) on a 32bit processor than an 8 bit device

Soft core FPGA processors are interesting, but may be constrained by the use of external RAM and the restriction of an 8 bit external bus hen accessing multi-byte words. It must therefore be possible to improve their performance, with the used of internal (on chip) RAM.