Skip to content

ports/nrf: Enable Link-time optimizations #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 3, 2017

Conversation

aykevl
Copy link

@aykevl aykevl commented Oct 24, 2017

I was experimenting with LTO. I've got it working for nRF51 without SoftDevice but couldn't get it to work when it is compiled with a SoftDevice (goes straight to a HardFault). Nonetheless, I wanted to share what I've got so far, if you're interested. Of course, it will need a lot of testing, as LTO support is relatively new and no other ports have implemented it.

It saves roughly 1kB of flash and 52 (?) bytes of RAM.

@glennrub
Copy link
Collaborator

glennrub commented Oct 27, 2017

Great, thanks Ayke!
I recall from earlier that there might be some issue with -flto using SD. I will cross check with nrf52 softdevices, and see whether this is also an issue there.

Ref posts on Nordic dev zone like this one:
https://devzone.nordicsemi.com/question/29727/error-compiling-blinky_rtx-example-nrf51/?answer=29806#post-id-29806

It also looks like people have been struggling getting debug symbols when using -00, due to -flto stripping out the debug symbols. Would it be an idea to only add this -flto to targets using -0s ?
https://github.com/tralamazza/micropython/blob/master/ports/nrf/Makefile#L93

@tralamazza do you have any insight here? i see you have responded to the specific dev-zone question I've linked to.

@dhalbert
Copy link

@glennrub pointed me here and asked me to share some space-saving tips I had shared with him. @tannewt and I are working on CircuitPython for Adafruit, a fork of MicroPython. Our main work is for SAMD21 (M0+) and SAMD51 (M4F).

We use -Os for our builds.

Scott initiated using -flto and found it saved significant space for us, right now not quite 6kB flash space. We are using the latest (6.3.1) arm-non-eabi-gcc. There are a couple of spurious warnings caused by the internal frozen module code; there's a gcc bug for those.

We have also saved space by judicious use of the gcc flag -finline-limit=<n>, which limits the size of inlined routines. At n=80 or so, the size is about the same; reducing this can save significant space. Below 20 things seem to break.

We also found we were able to save a lot of space by using the libm in the MicroPython tree rather than the toolchain libm (saves 8.8kB), and by avoiding any accidental double-precision arithmetic elsewhere (saves 3kB). The toolchain libm uses double precision in several places. The internal libm routines don't do that, and are generally quite a bit smaller.

For M4F, will may not save as much space, since the software implementations of single precision will not be needed, and the double-precision routines will be smaller. But I just tested the SAMD51 (M4F) port by turning -lm back on, and we are still saving 5.5kB by using the internal libm. This does not include whatever savings we already gained by avoiding double precision.

Some PRs and issues of ours with more details (also see the referenced issues):

adafruit#118 (LTO caused a bug-inducing rearrangment of something that should have been 32-bit-aligned but was not).

Math-related:
adafruit#325
adafruit#343
adafruit#329

@tralamazza
Copy link
Owner

The only issues I ever had with lto was regarding weak symbols

@aykevl
Copy link
Author

aykevl commented Oct 28, 2017

I had another try today with hints from this post (adding a few more __attribute__((used)) statements to ISRs) but it didn't work sadly. Next step is trying to step through the code from reset and see where it hits the HardFault.

@glennrub I compiled with all default flags otherwise which includes -Os, I think it's unrelated.

@dhalbert thanks for all the ideas! We don't use floating point in this port so I doubt switching libm will help.

@tralamazza good point, should try to replace those weak symbols. But then again, it works fine without SD so it can't (?) be the ISR vector.

Another (unrelated) space saver I want to try is storing code/data in the first page, after the ISR pointers (just like stm32 does). That's about 800 bytes on the nRF51 but nearly 4k on nRF52 (I think).

@aykevl
Copy link
Author

aykevl commented Oct 28, 2017

I noticed it does work with SD enabled (at least I get a REPL) when I first write firmware.hex and then write the SoftDevice itself. When I again write firmware.hex it hits the HardFault again on startup. So it appears that -flto causes parts of the SD to be overwritten??

I didn't get very far using gdb.

@aykevl
Copy link
Author

aykevl commented Oct 31, 2017

Ok, I made a mistake. I flashed the .elf file instead of the .hex file, which apparently resulted in overwriting some critical flash area. Flashing the .hex file solves the problem (verify_image in OpenOCD shows that both SD and firmware are properly flashed, without overwriting each other).

I've also done a quick test for BLE: enabling advertising via the serial console and enabling the BLE console both work. So that means that after nRF52 is tested, I think it can be merged.

@glennrub
Copy link
Collaborator

Ah, i see. That could explain it!

At the moment i'm not able to test the nrf52 target because i'm out travelling this week. I'll try to perform a test this weekend, unless someone else verifies the nrf52 target before that. Code-wise i do not see anything that should block this PR.

@glennrub
Copy link
Collaborator

glennrub commented Nov 3, 2017

Looks good to me! I tried it with a nrf52 target (pca10040), and as well with microbit just to log some numbers on memory. I froze ublupy_eddystone.py and connected over BLE REPL (nus_console), then forked the eddystone example over the BLE link.

microbit:

bluetooth_conf.h: MICROPY_PY_BLE_NUS (1)
freeze/ubluepy_eddystone.py
SD=s110

Before: 
   text	   data	    bss	    dec	    hex	filename
 149052	    380	   4656	 154088	  259e8	build-microbit-s110/firmware.elf

with -flto:
   text	   data	    bss	    dec	    hex	filename
 147796	    320	   4660	 152776	  254c8	build-microbit-s110/firmware.elf

pca10040:

bluetooth_conf.h: MICROPY_PY_BLE_NUS (1)
freeze/ubluepy_eddystone.py
SD=s132

Before:
   text	   data	    bss	    dec	    hex	filename
 173968	   1560	  20140	 195668	  2fc54	build-pca10040-s132/firmware.elf

with -flto:
   text	   data	    bss	    dec	    hex	filename
 172404	   1488	  20152	 194044	  2f5fc	build-pca10040-s132/firmware.elf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants