-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Crash into the HardFault_Handler. #7907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just realised I'm running an older version of CircuitPython... would it make sense to start by upgrading the version and libraries to latest? |
Yes, please do. |
CircuitPython and Libraries all updated to the latest version. New copy of my CIRCUITPY drive attached. I will monitor for a couple of days and see how it goes. |
Helping beta test this one on the UM FeatherS2 with AndyWarburton. Adafruit CircuitPython 8.0.5 on 2023-03-31; FeatherS2 with ESP32S2
Board ID:unexpectedmaker_feathers2 Closest I can come is AdafruitIO with a DPS310 temp sensor. Your motor driver, other specific sensors, and home assistant broker I can't replicate on my setup. I had the same issue with the UM FeatherS3 (which that issue is now closed), It was specific to the S2 & S3 chips running Circuit Python. Happened with both Adafruit & UM boards in 7.3 and 8.x beta. Had something to do with Wifi triggering soft reboot and crashing after an unspecified and random period of time. Without Wifi it would work fine. The hard fault handler issue has been resolved to my knowledge on the S2 since 8.0.4 beta and on the S3 since 8.0.5 stable. This will be my first attempt at running a UM board again since the 8.0.1 beta. So far it's behaving very well with Wifi and running as expected. If you are running anything prior to 8.0.5 stable release yes you would absolutely have experienced hard faults. Since it's been resolved on the Adafruit S3 I have high hopes the UM boards will now be usable on Wifi going forward after 8.0.5 stable release. One of the downsides (if you can call it that) is the massive amount of RAM. If there is a RAM leak with a slowly updating script it could take days/weeks for it to crash. I don't expect that to happen but it's something to keep in the back of your mind. CIRCUITPY.zip |
Thank you @DJDevon3 ! |
A hard fault after a couple of days would be nearly impossible for me to track down. I thought this was a couple of hours thing when I first read it. Someone would have to put a debugger on it and hope it crashes. So far I'm about 6 hours running perfectly fine. This can't be the same issue I was having with the S3 as I would have hard faulted about 100 times by now. Not a single error let alone a hard fault. Looking good so far. Can you provide a better timeline range when to expect a hard fault? More specifics could help and I realize random crashes are hard to judge. How long of a runtime would you expect/hope to surpass before calling this one closed? 4 days, 5 days, 6 days, a week, 2 weeks? |
@DJDevon3 for me the Hard Fault was happening at around the 24-48 hour mark. My setup has now been running for about 12 hours without a problem but I'm hoping it'll just keep running "forever" I will check in again probably on Friday (tomorrow is a national holiday here in The Netherlands and I'll be out and about). |
My attempt failed at the 13 hour mark because it doesn't have a graceful failure try/except if the broker doesn't respond apparently. This is my first time ever using mqtt so I don't have a snippet for that part. It's mostly using the default AdafruitIO example. Circuit Python really needs to include try/except error handling with all examples for anything MQTT or Requests related. It failed inside io.loop() This is a failure in my script not the board. Traceback (most recent call last):
File "code.py", line 118, in <module>
File "adafruit_io/adafruit_io.py", line 239, in loop
File "adafruit_minimqtt/adafruit_minimqtt.py", line 1002, in loop
File "adafruit_minimqtt/adafruit_minimqtt.py", line 683, in ping
MMQTTException: PINGRESP not returned from broker. Added a try/except and also time.monotonic() timestamp print("Monotonic: ", time.monotonic())
print("Monotonic Hours: ", time.monotonic()/60/60) Monotonic: 51204.6
Monotonic Hours: 14.2235 Board has still been running for that amount of time. Since it didn't hard fault the clock keeps going. I'm still good to keep an eye on the total board run time. It's been hammering on the AdafruitIO feed every 10 seconds for 14 hours, so far about 5120 transactions... still looking good. How often do you poll your broker? |
#7490 is open for better crash data capture and would help here it seems. |
If we can get the UM FeatherS2 running for about 72 hours total I think we can close this and hand everything over to #7490. @tannewt tagging this issue means it's already referenced as additional data for their uses over in #7490. I wasn't sure if the IDF issue was still a thing so I didn't bring it up. Best scenario for this one is to hit 72 hours (fingers crossed), close it, and hand our findings over to #7490. 7.3.3 in particular was hit really hard with Wifi issues on the S2 & S3, Recommend support path to tell everyone to upgrade to 8.0.5 stable (or newer) asap for all S2 & S3 variant boards running wifi projects. |
Apparently try/except around the io.loop() doesn't help it fail gracefully. try:
io.loop()
except (ValueError, RuntimeError) as e:
print("MQTTException: \n", e)
time.sleep(60)
continue Monotonic: 173387.0
Monotonic Hours: 48.1629
Publishing 86.0045 to DemoFeed.
Traceback (most recent call last):
File "code.py", line 119, in <module>
File "adafruit_io/adafruit_io.py", line 239, in loop
File "adafruit_minimqtt/adafruit_minimqtt.py", line 1002, in loop
File "adafruit_minimqtt/adafruit_minimqtt.py", line 683, in ping
MMQTTException: PINGRESP not returned from broker. ctrl+d gets it right back on task and doesn't affect board run time (monotonic). Pretty sure that one is happening when I lose Wifi connection which is a regular occurrence for my entire home network not the board. Suggestions on how to bulletproof the io.loop() with try/except would be appreciated. Still going at 48 hours uptime regardless of the failures in my script. |
I think |
Yeah I copy pasted that from my weather station script that uses adafruit_requests. oops. Updated with: try:
io.loop()
except (MMQTTException) as e:
print("MMQTTException: \n", e)
time.sleep(300)
continue |
Good news, we're at four days now and all seems to be going well. The device rebooted a couple of times due to lost connections to the internet or home assistant (my fault for installing updates) but I haven't had a single HardFault since updating to 8.05 |
UM FeatherS2 Wifi is finally stable and good to go. #7907 can be closed. @andywarburton Monotonic Hours: 71.5934 Hammered AdafruitIO in 72 hours with over 25,000 transactions, not a single hard fault. |
Great, thanks @andywarburton and @DJDevon3 for re-testing. |
CircuitPython version
Code/REPL
Behavior
This code runs fine most of the time, successfully reading from sensors, reporting values to HomeAssistant via MQTT and activating the SERVO however it has been crashing at what seems to be random intervals, so to find out what was going on I left it plugged in to my computer for a couple of days with the REPL running (via TIO) and it finally crashed again with the error
CircuitPython core code crashed hard. Whoops!
and asked me to share the contents of my CIRCUITPY drive here (attached below with credentials removed from secrets.py).I don't believe it to be a memory issue as I have actually been logging the free memory via
gc.mem_free()
to HomeAssistant along with the sensor values (seemed like a good way to rule out that as a problem) and memory usage has been relatively consistent over its last run (aside from the big dip overnight, I don't know why that is!)It wouldn't surprise me if it's something wrong with my code because I'm not an engineer and mostly bodge together other code from tutorials around the internet etc.
Description
No response
Additional information
My circuitpython drive contents with secrets removed:
CIRCUITPY.zip
The text was updated successfully, but these errors were encountered: