Saturday, July 25, 2015

First steps coding and speed testing the STM32F746 Break Out Board



It's a few days since I received my prototype STM32F745 Breakout boards and I have slowly been writing some basic firmware functions to test out the hardware.

Whilst I have done a fair bit of code development for the STM32F4, F373 and F103 microcontrollers, this was my first exposure to the awesome Cortex M7.

Every programmer wants to create a comfortable coding environment around any new microcontroller platform, and often this is done by porting familiar routines across to the new target and re-using tried and tested libraries in order to develop the application.

With a few basic functions, you can do a surprising amount - here's one approach

digital input and output - flash a LED to confirm code is running
serial USART - allows text output and a simple command interface from PC
timers - by establishing a millisecond and microsecond tick you have precision timing
timers also allow pwm to be generated
ADCs - 12bit or even 16bit resolution measurements become possible
SPI  - connect to SD-Card,  Colour TFT LCD, etc
I2C - add an eeprom
RTC - and battery backup.  The STM32F series often have on chip RTC and non-volatile RAM

The Arduino project has done this time and again for the various microcontrollers used on their boards - proving that code originally written for an AVR, can be ported to various manufacturers ARM processors.  The STM32duino project has gained a fair amount of traction porting the Arduino environment so that it will run on STM32F1xx  M3 and STM32F4xx M4 processors.

The Arduino language successfully removes much of the complexity, or possibly quirkiness, from the C programming language, and at the same time creates a robust application framework onto which the hardware specific functions and libraries may be placed.

Purists, or old-school C programmers, may delight in criticising it's oversimplification of the language, but for beginners or those that have had no exposure to traditional C and C++, it's a fast way of getting simple applications running - and now with the benefit of moving between a range of more capable processors.

Tool Chain

I found that I could download a codesize limited (32KB) version of the Keil ARM-MDK, which runs under their uVision 5 toolchain. This was the easiest route to take, as it includes all the library headers, and numerous examples of how to configure the various on-chip peripherals.

If you are familiar with the STM32F  Standard Peripheral Libraries,  then this may be a mixed blessing.  STM have ported their functionality across to their new Hardware Abstraction Layer HAL, under their new development suite known as STM32 Cube.

There is no question that STM32 Cube will provide faster development for complex applications, as it combines graphical libraries, ethernet and TCP/IP stacks in an integrated design environment. However, the initial barrier to learning enough about HAL to reach the point of understanding it's various structures can be quite high, leaving the user somewhat perplexed that what should be easy, turns out to be really quite a challenge.

Once over the HAL Hurdle - progress will be much quicker.

Get it Running

The usual route to getting a new board up and running is to start with the bare minimum of code to prove that it will actually run.  Traditionally this is normally a simple LED flashing routine.
With the STM32 series of microcontrollers - and any 32bit mcu, for that matter, there is quite often a considerable quantity of registers to initialise first.

I started with the LED flash example, that came with the ARM-MDK download. It is often wiser to start with a known working example, rather than start from scratch, especially with an unfamiliar microcontroller.

As it happened, my problems were initially hardware related - as I had forgotten to connect the all-important analogue VDDA supply. Once this was connected, the microcontroller accepted code and sprang into life.  Without VDDA, the clock oscillator and the PLL blocks are not powered - so the mcu cannot run.

Once I had the LED flashing, it was time to check out the system timing, and make sure when you used the 100mS delay routine - that it was actually 100mS.  I had used an 8MHz crystal and so I had to change the PLL divisor ratio, in order to get the right timing, as the code in the example was for an EVAL kit - which used a 25MHz crystal.

Once this was done, I wrote the code to configure the GPIO pins so that I could toggle more of the I/O pins.

Next - USART

Basic GPIO is always the starting point for code development, but pretty soon you need to get a serial connection to a terminal program - so you can see numerical and text output plus the means of controlling the application with a simple command interpreter. With about 3 or 4 basic functions you can implement a surprisingly flexible user interface.

1.  putchar  - send a single ASCII character to the serial port.  This is the basis of all serial output.
2.  getchar  - receive a character from the serial interface
3.  print_num  - this will send an integer number to the serial interface
4.  Get_Serial  - a routine that accepts simple alpha-numeric serial commands for machine control

The USART proved somewhat trickier to implement, mainly because the HAL routines were somewhat different to the old STM32Fxxx_USART library code that I was used to. After a day spent failing to import the USART initialisation into my code, I decided to port my code into the HAL USART example - starting with a known working example.

This approach was a lot more successful, and soon I had printf(), putchar() and getchar() all working nicely.

Benchmarking the STM32F7xx.

Having got the first prototype running to the point where I could run code and get numerical output, I decided that it was time to run a standard benchmark test on it - so that I could compare it with other devices.

The dhrystone is the classic integer math benchmark that has been run on various computers for over 30 years.  A C code version (actually Arduino) of it can be downloaded from here

 http://www.saanlima.com/download/dhry21a.zip

Some months ago, I tested the standard Arduino MEGA, which uses an ATmega2560 running with a 16MHz clock.

The results were as follows

ATmega2560 16MHz

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 300000 runs through Dhrystone
Execution ends
Microseconds for one run through Dhrystone: 78.67
Dhrystones per Second: 12711.22
VAX MIPS rating = 7.23


STM32F746 216MHz

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 300000 runs through Dhrystone
Execution ends
Microseconds for one run through Dhrystone: 3.33
Dhrystones per Second: 300000
VAX MIPS rating = 170.74

So the STM32F7xx is approximately 24 times faster than the Arduino on integer maths. But when you look at the whole package:

1M Flash
320KB  SRAM
4 USARTS
USB full-speed
Up to 6 SPI
Up to 18 timers
etc, etc

The list of peripherals is most comprehensive.  Best check the datasheet for full details.

And all available in LQFP packages that the hobbyist can actually solder.

Conclusion.

This week I have made great progress getting the basics of a code development environment put together for the STM32F746.

The board was cheap and easy to build, with a record short development time of just 14 days from starting the pcb layout to first having the LED flashing, and so far all appears to be working as intended.

This is certainly a powerful processor, and with the right coding environment, such as Arduino or mbed will make a great little target for those wanting to move into 32bit  ARM bare metal programming.








1 comment:

Unknown said...

Hello. Can you explain how to use dhrystone benchmark for stm32?