Stuck in Safe Mode after Battery depletion #2694

ATMakersBill · 2020-03-10T01:36:51Z

While I'm not to trying to solve a solar-specific problem here, I have hit a problem on a solar project that seems like it would be an issue in other settings.

When my nrf52840 Sense feather's battery drains down, CP goes into Safe Mode. In that mode it doesn't draw much power, so the battery never completely dies (I mean it will but very slowly).

While in this mode, if the power is restored (i.e. the sun comes up & starts charging the battery), the feather never resets... even though there is enough power (the cause for Safe Mode), it will never recover.

I don't know enough about Safe Mode... is there any code running? Is there a chance to put a config setting where if SafeMode is activated by a brownout (vs. other issues) it continue to watch for power to come back to a reasonable level and if so reset the board? In that scenario one of two things would happen:

Power is restored... reboot occurs and life is good
The battery truly dies... then, power is restored, reboot occurs, and life is good.

@dhalbert suggested just not going into Safe Mode on brownout... I'm not sure what happens then... but I'm sure he'll explain it.

Bill

dhalbert · 2020-03-10T02:08:23Z

Thinking about this, perhaps we can add some hysteresis to the process, and not go into safe mode on a brownout. Instead, if we get a brownout interrupt, we should wait for some period of time and check the voltage again. If the voltage is normal (3.3v) we can pretend that it was like a hard reset. If the voltage is still low, then we can loop and check again after some period of time. Eventually either the battery will drain completely, or the power will be restored to normal.

This mode might be selectable. There are two use cases I can think of which want to detect brownouts in different ways:

@ATMakersBill's example of a discharging battery which will become charged again.
Some external device, like a motor, is drawing too much power, causing a brownout. In that case the scenario above (waiting to recover full voltage and then restarting normally) is not a good choice. The program will run again, the motor will draw too much power, and then the brownout will happen again. This could cause physical damage eventually to something (e.g. if there was a short somewhere).

I think our original idea of safe-mode brownout was to handle cases like over-current. We had not thought a lot about sagging and recharging batteries. It's two different plausible scenarios.

tannewt · 2020-03-10T18:51:55Z

I don't know enough about Safe Mode... is there any code running? Is there a chance to put a config setting where if SafeMode is activated by a brownout (vs. other issues) it continue to watch for power to come back to a reasonable level and if so reset the board? In that scenario one of two things would happen:

Safe mode runs all of the CircuitPython supervisor to give USB access to the filesystem but it doesn't run boot.py or code.py because it assumes something in the user code is fatal to the system.

Thinking about this, perhaps we can add some hysteresis to the process, and not go into safe mode on a brownout. Instead, if we get a brownout interrupt, we should wait for some period of time and check the voltage again. If the voltage is normal (3.3v) we can pretend that it was like a hard reset. If the voltage is still low, then we can loop and check again after some period of time. Eventually either the battery will drain completely, or the power will be restored to normal.

This mode might be selectable. There are two use cases I can think of which want to detect brownouts in different ways:
1. @ATMakersBill's example of a discharging battery which will become charged again.

2. Some external device, like a motor, is drawing too much power, causing a brownout. In that case the scenario above (waiting to recover full voltage and then restarting normally) is not a good choice. The program will run again, the motor will draw too much power, and then the brownout will happen again. This could cause physical damage eventually to something (e.g. if there was a short somewhere).
I think our original idea of safe-mode brownout was to handle cases like over-current. We had not thought a lot about sagging and recharging batteries. It's two different plausible scenarios.

I'm not sure it's the job of the micro to monitor it's own power. I know @ladyada just pointed out the UM803 power management IC for use with the imx rt whose sole job is to a hold a micro in reset until power is adequate.

The other thing to consider is the different implementations of brown out detection. The SAMDs detect the brown out as a reset source. So by this time, the chip had enough trouble with power that it was reset. I implemented this because I was accidentally setting all NeoTrellis pixels to full bright and dipping the power. Without safe mode it is impossible to recover from this without wiping the whole filesystem.

The nRF currently takes a different approach to the brownout by having an interrupt triggering the reset. Brownout isn't a reset reason we can read on startup. The warning level is configurable but the reset level is a fixed 1.7v. (Based on 5.3.1.6) Ideally we'd set the reset value to our own value and simply start up as normal once above that threshold.

The final wrinkle I can think of is the SPI flash. Although parts come in both 1.8v and 3.3v versions. I believe we always use the 3.3v versions. So even if the nRF is fine below 3.3v, the flash won't be. This is an argument for configurable reset or an external UM803 which is itself configurable.

My feeling is that a solar + battery powered circuit shouldn't attempt to start the MCU and flash up until it is outside the brownout range.

ATMakersBill · 2020-03-29T18:25:19Z

@tannewt I'm not sure I'm asking for that kind of support - it's not that I'm asking the system to monitor power etc. I'm just asking it not to go into a zombie mode that never resets. And I'm asking for it as a configurable option.

I think the simplest implementation of this request is to have a mode (configurable in boot.py) that says "On brownout, shut down all FLASH access and things that can corrupt filesystems, etc. and then periodically check to see if the brownout is over. If power has been good for a reasonable time, trigger reset."

It's possible that "check to see if the brownout is over" is not possible. In that case, I'd have the code wait a period of time (30 seconds?) and then just reset. If the power is still low, it will boot into CP, trigger the brownout, wait 30 seconds and reset again. Perhaps a solid check in the boot process to do any power checks that are available before enabling the FLASH would be good (seems that they'd be good anyway).

The problem is that as it is, CP MUST be manually reset after a battery gets discharged and then recharged (or power plugged in). This is not a solar issue - it's just my setting. Yes, it will eventually reset, but with a large battery (I'm using a 2500 mAH) and the device in safe mode, that will take a LONG time to draw down from the brownout state to the powered off state... and by then, power will have long been restored.

Are we on the same page?

tannewt · 2020-03-30T22:01:27Z

I think we're mostly on the same page. @dhalbert has a pending PR to the bootloader to validate the power on start as you suggest: https://github.com/adafruit/uf2-samdx1/pull/111/files#diff-803c5170888b8642f2a97e5e9423d399R181

I don't think we need a configuration setting for this though because everyone should want it. It makes no sense to start when power is unreliable.

The only other bit is to ensure that the brown out doesn't lead to a safe mode on start up. We could do this by writing a sentinel in RAM which will get wiped when power dips or by tracking the reset time in the backup domain and only safe moding for short blips.

maholli · 2020-04-21T17:52:32Z

I wanted to chime in and say we've encountered @ATMakersBill's failure mode countless times with students building solar and battery powered projects.

Maybe I can frame it in a different light:

regardless how you get there, recovering from safe mode requires user intervention

I think we need an ability to dictate safe mode behavior without hard-coding temporary fixes into main.c (for example).

tannewt · 2020-04-21T18:13:53Z

@maholli That is a good way to put it! I just filed #2795 and #2796 related to more low power work. The latter also needs a way to provide a start reason to user code. That could help the user's safe mode code too.

ita1024 · 2020-06-13T12:05:47Z

My Trinkets are not even on battery but are requiring too much manual resets, and adding more hardware starts to look expensive. Given that there is no quick fix/option in CircuitPython yet, I am looking into a workaround in the C code.

In my view, it would be ideal to exit the safe mode after for example 2 minutes. In the function wait_for_safe_mode_reset (supervisor/shared/safe_mode.c), would it be fine to call reset_cpu(); after a few ticks or would that reset the device into safe mode again?

Alternatively, would an immediate CPU reset be valid for my cases (never enter "safe mode")?

diff --git a/supervisor/shared/safe_mode.c b/supervisor/shared/safe_mode.c
index a167ab392..5a8ebd2d5 100644
--- a/supervisor/shared/safe_mode.c
+++ b/supervisor/shared/safe_mode.c
@@ -83,14 +83,14 @@ void safe_mode_on_next_reset(safe_mode_t reason) {
 
 // Don't inline this so it's easy to break on it from GDB.
 void __attribute__((noinline,)) reset_into_safe_mode(safe_mode_t reason) {
-    if (current_safe_mode > BROWNOUT && reason > BROWNOUT) {
-        while (true) {
-            // This very bad because it means running in safe mode didn't save us. Only ignore brownout
-            // because it may be due to a switch bouncing.
-        }
-    }
-
-    safe_mode_on_next_reset(reason);
+    //if (current_safe_mode > BROWNOUT && reason > BROWNOUT) {
+    //    while (true) {
+    //        // This very bad because it means running in safe mode didn't save us. Only ignore brownout
+    //        // because it may be due to a switch bouncing.
+    //    }
+    //}
+    //
+    //safe_mode_on_next_reset(reason);
     reset_cpu();
 }

tannewt · 2020-06-15T21:43:04Z

@dhalbert Can we close this? Didn't your bootloader changes fix this?

dhalbert · 2020-06-22T15:13:06Z

@dhalbert Can we close this? Didn't your bootloader changes fix this?

The bootloader fixes were only for the SAMD bootloader, and wasn't mean to cover this case, just the case where low-voltage running causes spurious flash write.s

If the power sags and then returns (the weak battery case)you'll still go into safe mode.

I did a little bit of experimentation added microcontroller.on_brownout(runmode), so that on brownout you can go into RunMode.SAFE_MODE (the default) or RunMode.NORMAL. I was storing the state of that in RAM, but my experimentation shows that it's still too easy to get stuck in safe mode when the power comes back up, because RAM can get wiped. I think the proper solution is to store the state of microcontroller.on_brownout() in flash.

tannewt · 2020-06-22T21:37:35Z

@dhalbert The way you fixed it though ensures that power is 3.3v or above right? Maybe all of our bootloaders should ensure that.

dhalbert · 2020-06-22T22:08:19Z

No, the bootloader just ensures that the power is above the brownout detection voltage, which is 2.7V. 2.8V is the maximum detection voltage on nRF52. 3.3V is probably too high, since it limits the battery life, and also the voltage after regulation may be lower.

Even ensuring a high voltage doesn't necessarily help. For instance, once the program starts up, it may start up devices that draw significant current (such as a wifi adapter or LEDs), and those could cause the voltage to sag. So always avoiding safe mode always, if requested, is the right thing to do. A program can monitor the voltage and decide to wait for a higher voltage, reset and try again, etc.

tannewt · 2020-06-22T22:17:41Z

Whose responsibility is it to make sure the external SPI flash voltage is high enough? The ability of the CPU to run may not match the voltage requirements of external chips on the board.

deshipu · 2020-06-22T23:59:32Z

I suppose the minimal voltage could be configured in the bootloader's configuration, together with the display stuff. Then it can be different per board, depending on what components are built in on it.

dhalbert · 2020-06-23T02:01:47Z

The SPI flash chips generally have a minimum operating voltage of 2.7V.

The nRF52840 has a forced reset when VDD is below 1.7V. It has a comparator that can generate an interrupt when VDD is below a set value. The maximum such value is 2.8V.

The SAMD51 can set the BOD33 brownout level up to about 3V. The SAMD21 can be even a bit higher.

The main issue, as I've mentioned, is that the battery voltage can be satisfactory at a light load to pass the voltage requirements, either in the bootloader or in the CircuitPython. But once the program starts running, the battery voltage may dip due to increased load. Right now this triggers brownout protection and a safe mode reset. Once the board is in safe mode, the program does not run, and the board is stuck in safe mode while the battery continues to get charged, say by solar power. So the board can't exit safe mode, and nothing happens. This is the primary problem.

If instead, the board simply reset into normal mode, then the program could run. In the worst case, the voltage would dip and the reset cycle would repeat over and over. If the charging rate exceeds the consumption rate, eventually the reset cycle would stop eventually, and the program could run. A better approach would be for the program to check the voltage periodically and simply wait for a high-enough battery voltage before turning on devices that increase the current consumption.

deshipu · 2020-07-08T08:20:36Z

I'm not sure if this helps in anything, but I have described my struggles with safe mode and power dipping here: https://hackaday.io/project/158981-kubik-m0/log/180416-safe-mode-problems

Any ideas would be appreciated.

ita1024 · 2020-07-08T10:20:46Z

@deshipu The safe.c workaround mentioned above works well for me so far.

deshipu · 2020-07-08T11:01:22Z

@ita1024 I would rather change the BOD33 level in port.c to something lower, and maybe enable the hysteresis. Especially since my boards don't have any other component with higher voltage requirements. However, I would like to publish my projects at some point, and have them added to the CircuitPython's repository, and that means I can't simply just hack the firmware.

Perhaps there could be an option for switching the minimal voltage per board?

dhalbert · 2020-07-08T14:04:32Z

@deshipu I put code in the UF2 bootloader to wait for 100ms after reaching the BOD33 level (2.7V). But I only did this on the SAMD51. I read your Hackaday post. Is that on SAMD51?

deshipu · 2020-07-08T14:08:29Z

No, that's SAMD21, sorry for not being specific. I suppose a delay would work in this particular case.

I still think it would be nice to be able to modify the level per board — I could make the PewPews work on battery much longer that way, for example, since they don't use flash.

deshipu · 2020-07-08T18:03:18Z

@dhalbert I went ahead and created #3130 — let me know what you think.

J-wire · 2020-07-22T20:09:45Z

Hey everyone,

I am wondering if you guys have a timeline for resolving this issue. I have an M4 express that is getting stuck in safe mode and I am looking for solutions. Any recommendations, including breakout board solutions, would be super helpful.

dhalbert · 2020-12-08T13:39:15Z

I had another idea for easily signaling that you don't want safe mode on brownout, and that would be to simply add a file to CIRCUITPY that has a distinctive name we can check for. Something like:

SAFEMODE.OFF would turn off all resets into safe mode, including brownout. This might be sufficient.
BROWNOUT_SAFEMODE.OFF would just turn off brownout safe mode, etc.

This filename thing has the advantage of being immediately visible, and easily removable (by loading a CIRCUITPY eraser). It moves any such setting from being buried in the flash to being easily controllable.

A more complicated suggestion is to have a safemode.py that is always run on startup, even when restarting in safe mode, which could examine microcontroller.cpu.reset_reason (a newly added feature) and do a programmatic reset to get out of safe mode, or otherwise disable it in some way.

There are similar such flag files used, for example, in RPi, where you can create a file called ssh on the boot drive, which enables ssh.

tannewt · 2020-12-08T18:17:08Z

I'd rather not have special files that indicate a setting. boot.py is really for settings.

I'm ok having a safemode.py though. We'd just have to caveat it with a bunch of warnings.

ATMakersBill · 2021-01-01T19:28:07Z

I like the idea of having a file that is run even when started in safe mode, @dhalbert . That would let me perform a test of the batteries and make a decision based on my actual situation. I also like that it puts the solution in Python rather than having to choose from 2 or three options written in C.

Just to flesh this out, would there be limitations on the code in safemode.py? For example, would the SD card still be read-only? Would it be run before sensors are active or anything like that?

However, I'd love this as a solution, and volunteer to test anything you come up with

Thanks
Bill

tannewt · 2021-01-05T22:41:22Z

I think safemode.py would be the only thing to run. It can reset the micro to escape safe mode.

RabidObeseMan · 2021-12-16T16:20:35Z

Are these solutions applicable on a circuit board express? I am running into the same issue but am not sure how to implement any of the solutions above :(

dhalbert · 2021-12-16T16:21:33Z

These solutions would work anywhere, but we have not implemented them yet. They require core changes to CircuitPython.

RabidObeseMan · 2021-12-16T17:26:49Z

Ah gotcha and I guess there is no current work arounds at the moment?

dhalbert · 2021-12-16T17:29:21Z

That is right, sorry. You could look into using one of the TPL power switches to force a power cycle, or figure out some other way to hard reset or power-cycle the board.

dhalbert added this to the Long term milestone Mar 12, 2020

tannewt added bug power supervisor labels Apr 8, 2021

anecdata mentioned this issue Jan 31, 2022

Safe Mode: mechanism for user code to recover without manual intervention #5956

Closed

tannewt modified the milestones: Long term, 8.1.0 Jan 23, 2023

dhalbert self-assigned this Feb 10, 2023

dhalbert mentioned this issue Feb 13, 2023

Implement safemode.py #7577

Merged

dhalbert closed this as completed in #7577 Feb 16, 2023

Stuck in Safe Mode after Battery depletion #2694

Stuck in Safe Mode after Battery depletion #2694

Comments

ATMakersBill commented Mar 10, 2020

dhalbert commented Mar 10, 2020

Uh oh!

tannewt commented Mar 10, 2020

Uh oh!

ATMakersBill commented Mar 29, 2020

Uh oh!

tannewt commented Mar 30, 2020

Uh oh!

maholli commented Apr 21, 2020

Uh oh!

tannewt commented Apr 21, 2020

Uh oh!

ita1024 commented Jun 13, 2020

Uh oh!

tannewt commented Jun 15, 2020

Uh oh!

dhalbert commented Jun 22, 2020

Uh oh!

tannewt commented Jun 22, 2020

Uh oh!

dhalbert commented Jun 22, 2020

Uh oh!

tannewt commented Jun 22, 2020

Uh oh!

deshipu commented Jun 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhalbert commented Jun 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deshipu commented Jul 8, 2020

Uh oh!

ita1024 commented Jul 8, 2020

Uh oh!

deshipu commented Jul 8, 2020

Uh oh!

dhalbert commented Jul 8, 2020

Uh oh!

deshipu commented Jul 8, 2020

Uh oh!

deshipu commented Jul 8, 2020

Uh oh!

J-wire commented Jul 22, 2020

Uh oh!

dhalbert commented Dec 8, 2020

Uh oh!

tannewt commented Dec 8, 2020

Uh oh!

ATMakersBill commented Jan 1, 2021

Uh oh!

tannewt commented Jan 5, 2021

Uh oh!

RabidObeseMan commented Dec 16, 2021

Uh oh!

dhalbert commented Dec 16, 2021

Uh oh!

RabidObeseMan commented Dec 16, 2021

Uh oh!

dhalbert commented Dec 16, 2021

Uh oh!

deshipu commented Jun 22, 2020 •

edited

Loading

dhalbert commented Jun 23, 2020 •

edited

Loading