Speex encoder for usage in embedded systems

Speex is a well known open source audio tool, that can be used to compress speech audio data into a very compact format.

Encoding and decoding data into speex format can even be done in a microcontroller! This requires at least a 32-bit microcontroller, running on a 48 MHz clock. A Cortex-M3 device can be used to perform this task.

Compression ratio’s of 90% are not uncommon with speex, whilst comprehensibility remains intact. That means that a 25 second, 8 Khz, 12 bit (–>16 bit) audio fragment  does not require 391 KBytes but a little more than 39 KBytes of storage memory. In a microcontroller, where embedded flash rarely exceeds 2Mbytes, this can be a major advantage when the designer does not want to add extra external flash, for security or other reasons. A typical Cortex-M3 from ST, with 1 MBytes of flash memory, can therefore (when using 80% for speex data) store up to 8.5 minutes of speex audio in its own flash.

ST Microelectronics has an application notice (STSW-STM32017) for embedding the speex encoder and decoder on a STM32F103.

Energy Micro describes a similar set-up for their EFM32 Gecko line.

both app notes concern themselves with providing both encoder and decoder on the embedded target. However, in many cases embedded microcontroller developers would want to encode data into speex format on their workstation, and use the encoded data in their microcontroller flash, just running the decoder in order to convert speex back to PCM and output it through a DAC or codec.

The trouble is, how do you store speex data in embedded flash? Speex stores data in (variable length) frames. In addition, for using the decoded speex data, the original sampling frequency and the number of frames must be known.

In your embedded application you can, of course, hard-code this information into the application if you just have one or two speex fragments. But once you decide to include many speex fragments, doing this manually becomes extremely tedious and error-prone.

Also, the raw binary speex file on the workstation cannot be used in the microcontroller directly. It must be converted to an array that can be included in the firmware code. Various tools exist, but ons some platforms (Windows) this is harder than on others (Unix based systems).

I designed and wrote a tool that does all this. It takes one or many audio files in WAV format, and converts these into a file that can be included directly into microcontroller source code. The audio date is accompanied by a small header that contains length and sampling frequency information.

speexembencoder

In addition to speex data, the tool can also generate (AD)PCM data, should you not want speex but these other formats in your code. The ADPCM generator was included from the ST Microelectronics ADPCM Library (STSW-STM32022).

The tool can be used to crop 16 bit audio data to a smaller range, in case the DA converter you are using (like the one in the STM32F103) has a smaller resolution.

The tool itself was written with wxWidgets and therefore is available in a Windows, a Mac, and a FreeBSD version.

The downloads are available :

Please be advised that the tool is still alpha-quality. The Python encoding format hasn’t been implemented yet. Not all language translations are complete and there may be bugs creeping around. I tried and tested the tool output successfully on an STM32F4 board, but I cannot guarantee anything.