Saturday, November 21, 2015

Beating the Bloat

Aaaaargh.......!

This post is by way of a minor rant about the current state of the tools and methods we use to produce embedded firmware.

In order to perform the benchmark tests on the series of processors yesterday, I had to use 4 individual IDEs and spend 12 hours of my life fighting the flab of blobby bloatware that is the embodiment of the modern IDE.

My grief really started when I wanted to port the J1 simulator to the Cortex M7. For this I needed a "professional"  tool chain.

The Long and Winding Road.......

In order to blink an LED on my STM32F746 breakout board, I had to install the 32K codesize limited version of Keil's uVision 5 and their ARM MDK. This takes about an hour to install and set up.

Then I had to find an example project of something that was close to what I wanted to do - i.e. blink a LED. I found their generic Blinky example - and then found that it had been tailored for a couple of commercial dev boards - and the files that set up the port allocation were locked from editing within the IDE.

So I opened the files in Notepad++, edited the dozen or so lines of code that controlled the GPIO port allocation, and then wrote my edited version in place of the original - so far, so good.

Had I known that at 6pm I was still about 2 hours away from blinking a LED, I would have probably thrown in the towel and gone to the pub.  I eventually tracked down the problem to my particular port pin being re-assigned as an input in the example code, immediately after I had set it up as an output. There was also a minor problem with the clock generation set up for the wrong PLL ratio - that prevented the code from running.

Now I have learnt that ARM processors are fairly complex beasts - and the peripherals take up a a fair time to set them up with the myriad of different options -but when I looked at the project files to blink a LED, I saw that it was taking about 100 code modules to set up the peripherals - and some of those modules were each 1000+ lines of code.

However - as a fairly recent newcomer to the Keil compiler and the ST Microelectronics hardware abstraction layer - who was I to know which of the 100 files I needed and which I didn''t.

This leads me nicely on to  Shotgun, Voodoo and Cargo Cult coding practices. I'll let the interested follow up the definitions, but the point that I am making is that the modern IDE and methods of using a hardware abstraction layer do absolutely nothing to help simplify the problem or reduce the amount of bloat that has to be compiled - regardless of whether it is being used or not.

In order to flash a LED on and off, a single bit in a register needs to change state - why then do I need to compile 10,000 lines of somebody else's code into a 9.5k byte binary, in order to make this happen?

Compilation times of over a minute really do nothing to boost one's productivity. Yet we persist with this madness making our compilation tools even more sophisticated - with the excuse that the processors that we are compiling for are getting more complex - and the commercial suppliers of compilation tools - need to be seen to be keeping ahead of the competition.

It has been ever thus for about the last 50 years or more - with the computing industry pedaling us over bloated, over expensive tools that we neither want nor need.

HAL: Just what do you think you're doing, Dave? 

Well perhaps Dave should be asking HAL just WTF he thinks he is doing.  

And in this case, HAL is the new hardware abstraction layer - cooked up by the teams of clever code monkeys at ST Microelectronics.  I understand that as code gets more complex then it needs to be better managed, and that somewhere out there, someone writing code for a Cortex M0 may have an epiphany moment and realise that he should port his code to a Cortex M7......  

However, it appears that ST Microelectronics has employed a million monkeys with typewriters to undertake the mammoth task of writing the HAL modules - put them in separate rooms (or countries) and made it difficult for them to talk to one another.

Not unsurprisingly, the HAL reference manual runs to 963 pages - and took another team of our simian chums to cook that one up. This link is actually for the STM32F4xx Cortex M4 processors - because it appears that the M7 has not been published yet.

So in reverence to the computer Holly, from Red Dwarf  - I will call this code behemoth HOL  - or the hardware obfuscation layer - as that is exactly what it does.  It makes it difficult to know what your hardware is doing, nor what you need to do, in order to make it work for you.

There has to be a better way - and if Carlsberg wrote compilation tool chains - they would probably be the best in the World.

OK  - time for the pub...........



4 comments:

  1. steph_tsf2:47 pm

    Hi Ken, I faced the same issue with the blinking led. This prevented me from experimenting with the SPDIF-in peripheral, and the TDM (multichannel audio) peripheral hooked to a pair of TDA7801 quad-DACs. Have you tried developing STM32F7 firmware with M$ Visual Studio Community 2015 (free) and OpenOCD, adding the initialization code that's generated by STM32CubeMX (graphic tool) ?

    ReplyDelete
  2. Steph,

    Thanks for the tip. I will look at using Cube MX in the future - but for now I found that the 'f746 is supported by the mbed tools - and I find coding with their stuff a lot easier, and more productive.


    Ken

    ReplyDelete
  3. Anonymous10:47 pm

    The world needs a tutorial, showing the STM32F7 breakboard, receiving audio (stereo) through the SPDIF-in interface, and outputting audio through the I2S (stereo) interfaces.

    The STM32F7 also features TDM (multichannel audio) interfaces, as input and as output, that you can configure as SPDIF-out. A lot of "headroom" would you say. Such is the STM32F7, what's regarding audio. The STM32F7 is the inexpensive ARM-based digital audio brick everybody waited for so long, for confronting it with the XMOS family, with the Analog Devices SigmaDSP family, and with the NXP-Freescale DSP56K family.

    The STM32F7 SPDIF-in interface is asynchronous, delivering a high-jitter LRCK interrupt that you can route to some GPIO output pin. The STM32F7 doesn't deliver a MCLK anyway. You thus need a modern high-quality audio DAC requiring no MCLK-in pin, only a LRCK-in pin, embedding a PLL acting as internal MCLK and LRCK regenerator.

    This is precisely what the PCM5100 audio DAC series are, designed by Burr-Brown, marketed by T.I. since T.I. purchased Burr-Brown. You will benefit from a high quality line-level audio output anywhere between 200 mV RMS and 2000 mV RMS, depending how you configure the audio DAC sensitivity. You are supposed to present such analog audio voltage to a power amplifier having a volume control. You may try reducing the audio amplitude, by dividing the audio in digital domain, at the expense of dynamic range, signal/noise ratio and distortion.

    One can design a STM32F7 board featuring two audio DAC ports, for outputting up to 4 audio channels, for implementing some experimental speaker management system, aka speaker crossover. Digital audio processing enables delay lines, FIR filters, IIR filters, and bidirectional IIR filters.

    Adding a 2-channel audio ADC enables hooking microphones, for real-time measuring speaker transfer functions fed by a white noise stimulus, and automatically conforming such transfer function into wanted reference transfer function, using a FIR filter basing on the FFT transform and inverse FFT transform.

    There are 4-channel audio DACs available nowadays, embedding the volume controls and the power amplifiers. Those are the TDA7801 4-channel audio DAC, and the TDA7802 4-channel audio DAC. Made by STM, best STM32F7 friends ? ... (to be followed)

    ReplyDelete
  4. Anonymous10:47 pm

    At this stage, I don't know if the PLL that's in the TDA7801 (or TDA7802) audio DAC, is as good as the PLL that's in the PCM5100 audio DAC. One could design the two audio DAC ports, to be compatible with two PCM5100 audio DACs, and to be compatible with one TDA7801 (or TDA7802) DAC.

    The STM32F7 supports the audiophile option consisting of hooking a 11.2896 MHz quartz or a 12.288 MHz quartz, as audio clock source directly feeding the I2S and TDM interfaces. A Raspberry Pi board is incapable of this. Buffer overflows and buffer underruns can be avoided by loading both quartz with a varicap diode within a frequency-locked-loop scheme. The TDA7801 (or TDA7802) audio DAC may require this for delivering their datsheet specs. Anyway, such frequency-locked-loop may be beneficiary to PCM5100 audio DACs. I remain in the context of exploiting the audio that's coming from SPDIF-in.

    Having a quartz as audio clock source is mandatory once you exploit audio coming from USB. This time, the voltage that's feeding the varicap diode must stay fixed, at some intermediate value.

    Let's detail how to exploit the audio coming from the SPDIF-in. The STM32F7 SPDIF-in routine would write the audio data into the middle of a 4096 audio sample buffer, circular. The STM32F7 frequency-locked-loop routine would continuously measure the long-term frequency mismatch between the audio sampling frequency that's originating from the SPDIF-in (data write pointer) and the audio sampling frequency that's originating from the local quartz (data read pointer). The STM32F7 I2S routine would read the audio data that's at the read pointer, and write such audio data into the I2S-out interface. This way, the I2S-out interface will output a high-quality LRCK signal (as master feeding the audio DAC), that's the quartz signal, hardware-divided inside the STM32F7.

    The STM32F7 frequency-locked-loop routine needs to continuously monitor the write and read pointer mismatch, filter the mismatch using a IIR lowpass filter, and convert what's now representing the long-term frequency mismatch into a 16-bit word feeding a hardware PWM pin (GPIO). A RC 1st-order lowpass filter outputs a quasi-DC voltage evolving between 1 Volt and 3.3 Volt, feeding the varicap through a 100 k Ohm resistor. This is the way the frequency-locked-loop works. This way the two clock domain remain in sweet sync. One will never need to drop audio samples for avoiding a buffer overflow. One will never need to repeat audio samples for avoding a buffer underrun.

    I have a thin doubt about mbed in such context. Some massive pedagogy will be required. Currently, mbed people dont't care about audio, and barely know the difference between serial, SPI, I2C, I2S, SPDIF, master and slave devices, and master and slave clocks.

    I remain in demand for a decent compiler / debugger environment able to ingest the STM32CubeMX code, and also able to communicate with a homemade STM32F7 breakout board, relying on the SWD (Serial Wire Debug) and/or relying on a plain Serial Interface (FTDI and USB).

    ReplyDelete