Electronics, Embedded Systems, and Software are my breakfast, lunch, and dinner.
Jun 28, 2018
As my final installment for the posts about my LED Wristwatch project I wanted to write about the self-programming bootloader I made for an STM32L052 and describe how it works. So far it has shown itself to be fairly robust and I haven't had to get out my STLink to reprogram the watch for quite some time.
The main object of this bootloader is to facilitate reprogramming of the device without requiring a external programmer. There are two ways that a microcontroller can accomplish this generally:
Each of these ways has their pros and cons. Option 1 allows for the user program to use all available flash (aside from the blob size and bootstrapping code). It also might not require a relocatable interrupt vector table (something that some ARM Cortex microcontrollers lack). However, it also means that there is no recovery without using JTAG or SWD to reflash the microcontroller if you somehow mess up the switchover into the bootloader. Option 2 allows for a fairly fail-safe bootloader. The bootloader is always there, even if the user program is not working right. So long as the device provides a hardware method for entering bootloader mode, the device can always be recovered. However, Option 2 is difficult to update (you have to flash it with a special program that overwrites the bootloader), wastes unused space in the bootloader-reserved section, and also requires some features that not all microcontrollers have.
Because the STM32L052 has a large amount of flash (64K) and implements the vector-table-offset register (allowing the interrupt vector table to be relocated), I decided to go with Option 2. Example code for this post can be found here:
**https://github.com/kcuzner/led-watch**
There's a few pieces to the bootloader that I'm going to describe here which are necessary for its function.
When the watch first boots, the bootloader is going to be the first thing that runs. Not all bootloaders work like this, but this is one of the simplest ways to get things rolling.
First, there's a few #defines and global variables that it would be good to know about for some context:
1#define EEPROM_SECTION ".eeprom,\"aw\",%nobits//" //a bit of a hack to prevent .eeprom from being programmed
2#define _EEPROM __attribute__((section (EEPROM_SECTION)))
3
4/**
5 * Mask for RCC_CSR defining which bits may trigger an entry into bootloader mode:
6 * - Any watchdog reset
7 * - Any soft reset
8 * - A pin reset (aka manual reset)
9 * - A firewall reset
10 */
11#define BOOTLOADER_RCC_CSR_ENTRY_MASK (RCC_CSR_WWDGRSTF | RCC_CSR_IWDGRSTF | RCC_CSR_SFTRSTF | RCC_CSR_PINRSTF | RCC_CSR_FWRSTF)
12
13/**
14 * Magic code value to make the bootloader ignore any of the entry bits set in
15 * RCC_CSR and skip to the user program anyway, if a valid program start value
16 * has been programmed.
17 */
18#define BOOTLOADER_MAGIC_SKIP 0x3C65A95A
19
20static _EEPROM struct {
21 uint32_t magic_code;
22 union {
23 uint32_t *user_vtor;
24 uint32_t user_vtor_value;
25 };
26} bootloader_persistent_state;
There are a few things that can be gathered from this:
The first thing that the bootloader does is ask the following question to determine if it should run the user application:
1void bootloader_init(void)
2{
3 //if the prog_start field is set and there are no entry bits set in the CSR (or the magic code is programmed appropriate), start the user program
4 if (bootloader_persistent_state.user_vtor &&
5 (!(RCC->CSR & BOOTLOADER_RCC_CSR_ENTRY_MASK) || bootloader_persistent_state.magic_code == BOOTLOADER_MAGIC_SKIP))
6 {
7...
Reading here, we can see that if there is a user_vtor value and there was either no reset condition forcing an entry into bootloader mode or the magic number was programmed to our state, we're going to continue and load the user program rather than staying in bootloader mode.
The most important part here is the CSR check. This is what gives this bootloader some "recoverability" facilities. Basically if there's any reset except a power-on reset, it will assume that there's a problem with the application program and that it shouldn't execute it. It will stay in bootloader mode. This aids in writing application firmware since a hard fault followed by a WDT reset will result in the microcontroller safely entering bootloader mode. The downside to this is that it could make debugging difficult if you are trying to figure out why something like a hard fault occurred in the first place (though I could argue that you should be using the SWD dongle anyway to debug your program).
The next thing to explain here is probably the purpose of this magic_code value. The idea here is to have some number that is highly unlikely to appear randomly in the EEPROM which we will use to "override" the CSR check. This occurs when the program is finished being flashed for the first time. The bootloader itself will execute a soft-reset to start the newly flashed user program (which is something that the CSR check will abort execution of the user program for).
After the bootloader determines that it needs to run the user's program, it will execute the following:
1if (bootloader_persistent_state.magic_code)
2 nvm_eeprom_write_w(&bootloader_persistent_state.magic_code, 0);
3__disable_irq();
4uint32_t sp = bootloader_persistent_state.user_vtor[0];
5uint32_t pc = bootloader_persistent_state.user_vtor[1];
6SCB->VTOR = bootloader_persistent_state.user_vtor_value;
7__asm__ __volatile__("mov sp,%0\n\t"
8 "bx %1\n\t"
9 : /* no output */
10 : "r" (sp), "r" (pc)
11 : "sp");
12while (1) { }
The first step here is to reset the magic_code value, since this is a one-time CSR-check override. Next, interrupts are disabled and some steps are taken to start executing the user program:
After these steps are performed, the user program will begin to run. Since this whole process occurs from the initial reset state of the processor and doesn't modify any clock enable values, the user program runs in the same environment that it would if it were the program being executed as reset.
In summary, the bootloader is entered immediately upon device reset. It then decides to either run the user program (exiting the bootloader) or continue on in bootloader mode based on the value of the CSR register.
One main goal I had with this bootloader is that it should be driverless and cross-platform. To facilitate this, the bootloader enumerates as a USB Human Interface Device. Here is my report descriptor for the bootloader:
1static const USB_DATA_ALIGN uint8_t hid_report_descriptor[] = {
2 HID_SHORT(0x04, 0x00, 0xFF), //USAGE_PAGE (Vendor Defined)
3 HID_SHORT(0x08, 0x01), //USAGE (Vendor 1)
4 HID_SHORT(0xa0, 0x01), //COLLECTION (Application)
5 HID_SHORT(0x08, 0x01), // USAGE (Vendor 1)
6 HID_SHORT(0x14, 0x00), // LOGICAL_MINIMUM (0)
7 HID_SHORT(0x24, 0xFF, 0x00), //LOGICAL_MAXIMUM (0x00FF)
8 HID_SHORT(0x74, 0x08), // REPORT_SIZE (8)
9 HID_SHORT(0x94, 0x40), // REPORT_COUNT(64)
10 HID_SHORT(0x80, 0x02), // INPUT (Data, Var, Abs)
11 HID_SHORT(0x08, 0x01), // USAGE (Vendor 1)
12 HID_SHORT(0x90, 0x02), // OUTPUT (Data, Var, Abs)
13 HID_SHORT(0xc0), //END_COLLECTION
14};
Our reports are very simple: We have a 64-byte IN report and a 64-byte OUT report. Although the report descriptor only describes these as simple arrays, the bootloader will actually type-pun them into something a little more structured as follows:
1static union {
2 uint32_t buffer[16];
3 struct {
4 uint32_t last_command;
5 uint32_t flags;
6 uint32_t crc32_lower;
7 uint32_t crc32_upper;
8 uint8_t data[48];
9 };
10} in_report;
11
12static union {
13 uint32_t buffer[16];
14 struct {
15 uint32_t command;
16 uint32_t *address;
17 uint32_t crc32_lower;
18 uint32_t crc32_upper;
19 };
20} out_report;
To program the device, this bootloader implements a state machine that interprets sequences of OUT reports and issues IN reports as follows:
A more detailed description of this protocol can be found at https://github.com/kcuzner/led-watch/blob/master/bootloader/README.md.
I'll cover briefly the process for writing the flash on the STM32. On my particular model, flash pages are 128 bytes and writes are always done in 64-byte groups. This is fairly standard for NOR flash that is seen in microcontrollers. When self-programming, one of the main issues I ran into was that the processor is not allowed to access the flash memory while a flash write is occurring. This is a problem since the flash write process requires the program to poll registers and wait for events to finish. Since this code by default resides in the flash memory, that will cause the write to fail. The solution to this is fairly straightforward: We have to ensure that the code that actually performs flash writes lives in RAM. Since RAM is executable on the STM32, this is just as simple as requesting the linker to locate the functions in RAM. Here's my code that does flash erases and writes:
1/**
2 * Certain functions, such as flash write, are easier to do if the code is
3 * executed from the RAM. This decoration relocates the function there and
4 * prevents any inlining that might otherwise move the function to flash.
5 */
6#define _RAM __attribute__((section (".data#"), noinline))
7
8/**
9 * RAM-located function which actually performs page erases.
10 *
11 * address: Page-aligned address to erase
12 */
13static _RAM bool nvm_flash_do_page_erase(uint32_t *address)
14{
15 //erase operation
16 FLASH->PECR |= FLASH_PECR_ERASE | FLASH_PECR_PROG;
17 *address = (uint32_t)0;
18 //wait for completion
19 while (FLASH->SR & FLASH_SR_BSY) { }
20 if (FLASH->SR & FLASH_SR_EOP)
21 {
22 //completed without incident
23 FLASH->SR = FLASH_SR_EOP;
24 return true;
25 }
26 else
27 {
28 //there was an error
29 FLASH->SR = FLASH_SR_FWWERR | FLASH_SR_PGAERR | FLASH_SR_WRPERR;
30 return false;
31 }
32}
33
34/**
35 * RAM-located function which actually performs half-page writes on previously
36 * erased pages.
37 *
38 * address: Half-page aligned address to write
39 * data: Array to 16 32-bit words to write
40 */
41static _RAM bool nvm_flash_do_write_half_page(uint32_t *address, uint32_t *data)
42{
43 uint8_t i;
44
45 //half-page program operation
46 FLASH->PECR |= FLASH_PECR_PROG | FLASH_PECR_FPRG;
47 for (i = 0; i < 16; i++)
48 {
49 *address = data[i]; //the actual address written is unimportant as these words will be queued
50 }
51 //wait for completion
52 while (FLASH->SR & FLASH_SR_BSY) { }
53 if (FLASH->SR & FLASH_SR_EOP)
54 {
55 //completed without incident
56 FLASH->SR = FLASH_SR_EOP;
57 return true;
58 }
59 else
60 {
61 //there was an error
62 FLASH->SR = FLASH_SR_FWWERR | FLASH_SR_NOTZEROERR | FLASH_SR_PGAERR | FLASH_SR_WRPERR;
63 return false;
64
65 }
66}
The other thing to discuss about self-programming is the way the STM32 protects itself against erroneous writes. It does this by "locking" and "unlocking" using writes of magic values to certain registers in the FLASH module. The idea is that the flash should only be unlocked for just the amount of time needed to actually program the flash and then locked again. This prevents program corruption due to factors like incorrect code, ESD causing the microcontroller to wig out, power loss, and other things that really can't be predicted. I do the following to actually execute writes to the flash (note how the following code uses the _RAM-located functions I noted earlier):
1/**
2 * Unlocks the PECR and the flash
3 */
4static void nvm_unlock_flash(void)
5{
6 nvm_unlock_pecr();
7 if (FLASH->PECR & FLASH_PECR_PRGLOCK)
8 {
9 FLASH->PRGKEYR = 0x8c9daebf;
10 FLASH->PRGKEYR = 0x13141516;
11 }
12}
13
14/**
15 * Locks all unlocked NVM regions and registers
16 */
17static void nvm_lock(void)
18{
19 if (!(FLASH->PECR & FLASH_PECR_PELOCK))
20 {
21 FLASH->PECR |= FLASH_PECR_OPTLOCK | FLASH_PECR_PRGLOCK | FLASH_PECR_PELOCK;
22 }
23}
24
25
26bool nvm_flash_erase_page(uint32_t *address)
27{
28 bool result = false;
29
30 if ((uint32_t)address & 0x7F)
31 return false; //not page aligned
32
33 nvm_unlock_flash();
34 result = nvm_flash_do_page_erase(address);
35 nvm_lock();
36 return result;
37}
38
39bool nvm_flash_write_half_page(uint32_t *address, uint32_t *data)
40{
41 bool result = false;
42
43 if ((uint32_t)address & 0x3F)
44 return false; //not half-page aligned
45
46 nvm_unlock_flash();
47 result = nvm_flash_do_write_half_page(address, data);
48 nvm_lock();
49 return result;
50}
More information about these magic numbers and the unlock-lock sequencing can be found in the documentation for the PRGKEYR register in the FLASH module on the STM32L052.
By combining the bootloader state machine with these methods for writing the flash, we can build a self-programming bootloader. Internally, it also checks to make sure we aren't trying to overwrite anything we shouldn't by ensuring that the write only applies to areas of user flash, not to the bootloader's reserved segment. In addition, it also verifies every page written against the original data to be programmed.
I do recommend reading through the code for the bootloader state machine (just bootloader.c in the bootloader directory). The state machine is table-based (see the "fsm" constant table variable and the "bootloader_tick" function) and I find that to be a very maintainable model for writing state machines in C.
One big thing we haven't yet covered is how exactly the user application needs to be changed in order to be compatible with the bootloader. Due to how the bootloader is structured (it just lives in the first bit of flash) and how it is entered (any reset other than power-on will enter bootloader mode), the only real change needed to make a user program compatible is to relocate where the linker script places the user program in flash (leaving the first section of it blank). In my linker script for the LED watch, I changed the MEMORY directive to read as follows:
1MEMORY
2{
3 FLASH (RX) : ORIGIN = 0x08002000, LENGTH = 56K
4 RAM (W!RX) : ORIGIN = 0x20000000, LENGTH = 8K
5 PMA (W) : ORIGIN = 0x40006000, LENGTH = 512 /* 256 x 16bit */
6}
The flash segment has been shorted from 64K to 56K and the ORIGIN has been moved up to 0x08002000. The first 8KB of flash are now reserved for the bootloader. The bootloader is linked just like any other program, with the ORIGIN at 0x08000000, but its LENGTH is set to 8K instead.
When the user program wishes to enter bootloader mode, it just needs to issue a soft reset. The LED watch has a command for this that is issued over USB and just executes the following when it receives that command:
1//entering bootloader mode with a simple soft reset
2NVIC_SystemReset();
Very simple, very easy.
The host software is written in python and uses pyhidapi to talk to the bootloader. It really is nothing complicated, since it just reads intel hex files and dumps them into the watch by operating the state machine. When it is finished, it tells the bootloader the location of the start of the program so that it can read the initial stack pointer and the address of the reset function by issuing the "exit" command. This also boots into the user program. Pretty much all the heavy lifting and "interesting" stuff for a bootloader happens in the bootloader itself, rather than in host software.
One small hack is that the host software does hardcode where it believes the program should start (address 0x08002000). One possible resolution for this hack is to take elf files instead of intel hex files, or just assume the lowest address in the hex file is the starting point.
This is my first bootloader that I've written for one of my projects. There were challenges getting it to work at first, but I hope that I've shown it isn't an incredibly complex thing to write. I actually got better performance flashing over USB than over SWD, so that is an additional win for writing this and if I didn't use the SWD for debugging so much I would probably always use a bootloader like this on my projects.
I hope this has been a useful read and I do encourage actually checking out the source code, since I've been pretty brief about some parts of the bootloader.