Crash into the HardFault_Handler. #7907

andywarburton · 2023-04-25T14:42:23Z

CircuitPython version

Adafruit CircuitPython 7.2.0 on 2022-02-24; FeatherS2 with ESP32S2
Board ID:unexpectedmaker_feathers2

Code/REPL

[13:52:08.944] Disconnected
[13:52:09.951] Warning: Could not open tty device (No such file or directory)
[13:52:09.951] Waiting for tty device..
[13:52:11.976] Connected
You are in safe mode because:
CircuitPython core code crashed hard. Whoops!
Crash into the HardFault_Handler.
Please file an issue with the contents of your CIRCUITPY drive at 
https://github.com/adafruit/circuitpython/issues

Press any key to enter the REPL. Use CTRL-D to reload.

Behavior

This code runs fine most of the time, successfully reading from sensors, reporting values to HomeAssistant via MQTT and activating the SERVO however it has been crashing at what seems to be random intervals, so to find out what was going on I left it plugged in to my computer for a couple of days with the REPL running (via TIO) and it finally crashed again with the error CircuitPython core code crashed hard. Whoops! and asked me to share the contents of my CIRCUITPY drive here (attached below with credentials removed from secrets.py).

I don't believe it to be a memory issue as I have actually been logging the free memory via gc.mem_free() to HomeAssistant along with the sensor values (seemed like a good way to rule out that as a problem) and memory usage has been relatively consistent over its last run (aside from the big dip overnight, I don't know why that is!)

It wouldn't surprise me if it's something wrong with my code because I'm not an engineer and mostly bodge together other code from tutorials around the internet etc.

Description

No response

Additional information

My circuitpython drive contents with secrets removed:
CIRCUITPY.zip

The text was updated successfully, but these errors were encountered:

andywarburton · 2023-04-25T14:43:34Z

Just realised I'm running an older version of CircuitPython... would it make sense to start by upgrading the version and libraries to latest?

tannewt · 2023-04-25T16:23:04Z

Just realised I'm running an older version of CircuitPython... would it make sense to start by upgrading the version and libraries to latest?

Yes, please do.

andywarburton · 2023-04-25T17:16:43Z

CircuitPython and Libraries all updated to the latest version. New copy of my CIRCUITPY drive attached. I will monitor for a couple of days and see how it goes.
CIRCUITPY.zip

DJDevon3 · 2023-04-25T19:03:44Z

Helping beta test this one on the UM FeatherS2 with AndyWarburton.

Adafruit CircuitPython 8.0.5 on 2023-03-31; FeatherS2 with ESP32S2
Board ID:unexpectedmaker_feathers2

Closest I can come is AdafruitIO with a DPS310 temp sensor. Your motor driver, other specific sensors, and home assistant broker I can't replicate on my setup.

I had the same issue with the UM FeatherS3 (which that issue is now closed), It was specific to the S2 & S3 chips running Circuit Python. Happened with both Adafruit & UM boards in 7.3 and 8.x beta. Had something to do with Wifi triggering soft reboot and crashing after an unspecified and random period of time. Without Wifi it would work fine.

The hard fault handler issue has been resolved to my knowledge on the S2 since 8.0.4 beta and on the S3 since 8.0.5 stable. This will be my first attempt at running a UM board again since the 8.0.1 beta. So far it's behaving very well with Wifi and running as expected.

If you are running anything prior to 8.0.5 stable release yes you would absolutely have experienced hard faults. Since it's been resolved on the Adafruit S3 I have high hopes the UM boards will now be usable on Wifi going forward after 8.0.5 stable release.

One of the downsides (if you can call it that) is the massive amount of RAM. If there is a RAM leak with a slowly updating script it could take days/weeks for it to crash. I don't expect that to happen but it's something to keep in the back of your mind.

CIRCUITPY.zip
Including my script loosely based on yours. It's the closest I can get as you have a very specific setup. Will run this in the background with you for a couple days. Updating every 10 seconds to AdafruitIO so if there is an issue it will crop up faster.

andywarburton · 2023-04-25T20:52:39Z

Thank you @DJDevon3 !

DJDevon3 · 2023-04-26T00:07:48Z

A hard fault after a couple of days would be nearly impossible for me to track down. I thought this was a couple of hours thing when I first read it. Someone would have to put a debugger on it and hope it crashes. So far I'm about 6 hours running perfectly fine. This can't be the same issue I was having with the S3 as I would have hard faulted about 100 times by now.

Not a single error let alone a hard fault. Looking good so far.

Can you provide a better timeline range when to expect a hard fault? More specifics could help and I realize random crashes are hard to judge. How long of a runtime would you expect/hope to surpass before calling this one closed? 4 days, 5 days, 6 days, a week, 2 weeks?

andywarburton · 2023-04-26T07:45:34Z

@DJDevon3 for me the Hard Fault was happening at around the 24-48 hour mark. My setup has now been running for about 12 hours without a problem but I'm hoping it'll just keep running "forever"

I will check in again probably on Friday (tomorrow is a national holiday here in The Netherlands and I'll be out and about).

DJDevon3 · 2023-04-26T09:03:43Z

My attempt failed at the 13 hour mark because it doesn't have a graceful failure try/except if the broker doesn't respond apparently. This is my first time ever using mqtt so I don't have a snippet for that part. It's mostly using the default AdafruitIO example. Circuit Python really needs to include try/except error handling with all examples for anything MQTT or Requests related. It failed inside io.loop()

This is a failure in my script not the board.

Traceback (most recent call last):
  File "code.py", line 118, in <module>
  File "adafruit_io/adafruit_io.py", line 239, in loop
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 1002, in loop
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 683, in ping
MMQTTException: PINGRESP not returned from broker.

Added a try/except and also time.monotonic() timestamp

print("Monotonic: ", time.monotonic())
print("Monotonic Hours: ", time.monotonic()/60/60)

Monotonic:  51204.6
Monotonic Hours:  14.2235

Board has still been running for that amount of time. Since it didn't hard fault the clock keeps going. I'm still good to keep an eye on the total board run time. It's been hammering on the AdafruitIO feed every 10 seconds for 14 hours, so far about 5120 transactions... still looking good.

How often do you poll your broker?

tannewt · 2023-04-26T17:13:22Z

#7490 is open for better crash data capture and would help here it seems.

DJDevon3 · 2023-04-26T20:17:00Z

If we can get the UM FeatherS2 running for about 72 hours total I think we can close this and hand everything over to #7490. @tannewt tagging this issue means it's already referenced as additional data for their uses over in #7490.

I wasn't sure if the IDF issue was still a thing so I didn't bring it up. Best scenario for this one is to hit 72 hours (fingers crossed), close it, and hand our findings over to #7490.

7.3.3 in particular was hit really hard with Wifi issues on the S2 & S3, Recommend support path to tell everyone to upgrade to 8.0.5 stable (or newer) asap for all S2 & S3 variant boards running wifi projects.

DJDevon3 · 2023-04-27T19:33:02Z

Apparently try/except around the io.loop() doesn't help it fail gracefully.

try:
        io.loop()
    except (ValueError, RuntimeError) as e:
        print("MQTTException: \n", e)
        time.sleep(60)
        continue

Monotonic:  173387.0
Monotonic Hours:  48.1629
Publishing 86.0045 to DemoFeed.
Traceback (most recent call last):
  File "code.py", line 119, in <module>
  File "adafruit_io/adafruit_io.py", line 239, in loop
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 1002, in loop
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 683, in ping
MMQTTException: PINGRESP not returned from broker.

ctrl+d gets it right back on task and doesn't affect board run time (monotonic). Pretty sure that one is happening when I lose Wifi connection which is a regular occurrence for my entire home network not the board.

Suggestions on how to bulletproof the io.loop() with try/except would be appreciated.

Still going at 48 hours uptime regardless of the failures in my script.

dhalbert · 2023-04-27T20:14:52Z

I think MMQTTException is just not a subclass of ValueError or RuntimeError. So you want to catch more classes of exceptions than what you have now, or we should make MMQTTException be a subclass of RuntimeError.

DJDevon3 · 2023-04-27T20:17:55Z

Yeah I copy pasted that from my weather station script that uses adafruit_requests. oops. Updated with:

try:
        io.loop()
    except (MMQTTException) as e:
        print("MMQTTException: \n", e)
        time.sleep(300)
        continue

andywarburton · 2023-04-28T12:05:01Z

Good news, we're at four days now and all seems to be going well. The device rebooted a couple of times due to lost connections to the internet or home assistant (my fault for installing updates) but I haven't had a single HardFault since updating to 8.05

DJDevon3 · 2023-04-28T19:08:16Z

UM FeatherS2 Wifi is finally stable and good to go. #7907 can be closed. @andywarburton

Monotonic Hours:  71.5934

Hammered AdafruitIO in 72 hours with over 25,000 transactions, not a single hard fault.
Tests: Wifi, AdafruitIO, MQTT, I2C

dhalbert · 2023-04-28T19:13:20Z

Great, thanks @andywarburton and @DJDevon3 for re-testing.

andywarburton added the bug label Apr 25, 2023

tannewt added crash needs retest esp32-s2 labels Apr 25, 2023

tannewt added this to the Long term milestone Apr 25, 2023

dhalbert closed this as completed Apr 28, 2023

This was referenced Apr 30, 2023

CircuitPython core code crashed hard. Whoops! #7925

Closed

MMQTTException Error Handling adafruit/Adafruit_CircuitPython_MiniMQTT#163

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash into the HardFault_Handler. #7907

Crash into the HardFault_Handler. #7907

andywarburton commented Apr 25, 2023 •

edited

Loading

andywarburton commented Apr 25, 2023

tannewt commented Apr 25, 2023

andywarburton commented Apr 25, 2023

DJDevon3 commented Apr 25, 2023 •

edited

Loading

andywarburton commented Apr 25, 2023

DJDevon3 commented Apr 26, 2023 •

edited

Loading

andywarburton commented Apr 26, 2023

DJDevon3 commented Apr 26, 2023 •

edited

Loading

tannewt commented Apr 26, 2023

DJDevon3 commented Apr 26, 2023 •

edited

Loading

DJDevon3 commented Apr 27, 2023

dhalbert commented Apr 27, 2023

DJDevon3 commented Apr 27, 2023

andywarburton commented Apr 28, 2023

DJDevon3 commented Apr 28, 2023 •

edited

Loading

dhalbert commented Apr 28, 2023

Crash into the HardFault_Handler. #7907

Crash into the HardFault_Handler. #7907

Comments

andywarburton commented Apr 25, 2023 • edited Loading

CircuitPython version

Code/REPL

Behavior

Description

Additional information

andywarburton commented Apr 25, 2023

tannewt commented Apr 25, 2023

andywarburton commented Apr 25, 2023

DJDevon3 commented Apr 25, 2023 • edited Loading

andywarburton commented Apr 25, 2023

DJDevon3 commented Apr 26, 2023 • edited Loading

andywarburton commented Apr 26, 2023

DJDevon3 commented Apr 26, 2023 • edited Loading

tannewt commented Apr 26, 2023

DJDevon3 commented Apr 26, 2023 • edited Loading

DJDevon3 commented Apr 27, 2023

dhalbert commented Apr 27, 2023

DJDevon3 commented Apr 27, 2023

andywarburton commented Apr 28, 2023

DJDevon3 commented Apr 28, 2023 • edited Loading

dhalbert commented Apr 28, 2023

andywarburton commented Apr 25, 2023 •

edited

Loading

DJDevon3 commented Apr 25, 2023 •

edited

Loading

DJDevon3 commented Apr 26, 2023 •

edited

Loading

DJDevon3 commented Apr 26, 2023 •

edited

Loading

DJDevon3 commented Apr 26, 2023 •

edited

Loading

DJDevon3 commented Apr 28, 2023 •

edited

Loading