Skip to content

ESP32-S3 Ctrl-D caused hard crash after get message is sent to adafruit_requests object unless _free_sockets() is called #6502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RetiredWizard opened this issue Jun 19, 2022 · 12 comments · Fixed by #7291
Assignees
Milestone

Comments

@RetiredWizard
Copy link

RetiredWizard commented Jun 19, 2022

CircuitPython version

Adafruit CircuitPython 7.3.0-dirty on 2022-06-19; FeatherS3 with ESP32S3
Adafruit CircuitPython 7.3.0-dirty on 2022-06-19; ESP32-S3-DevKitC-1-N8R2 with ESP32S3
Adafruit CircuitPython 8.0.0-alpha.1-13-gd0953e8f8 on 2022-06-19; FeatherS3 with ESP32S3

Code/REPL

import ssl,wifi,socketpool
import adafruit_requests as requests
from secrets import secrets
wifi.radio.connect(secrets["ssid"], secrets["password"])
socket = socketpool.SocketPool(wifi.radio)
https = requests.Session(socket, ssl.create_default_context())
TEXT_URL = "https://finance.yahoo.com/quote/%5EIXIC"
response = https.get(TEXT_URL)

Behavior

The code runs fine, you can even continue to use the response object, the following instruction will print the next 32 bytes of the stream:

    print(next(iter(response.iter_content(chunk_size=32))))

but after the .get message has been sent, unless https._free_sockets() is called, the next Ctrl-D causes the board to crash into safe mode. On the Devkit, the following is displayed over the UART connection:

  ESP-ROM:esp32s3-20210327
  Build:Mar 27 2021
  rst:0xc (RTC_SW_CPU_RST),boot:0x2a (SPI_FAST_FLASH_BOOT)
  Saved PC:0x421093ba
  SPIWP:0xee
  mode:DIO, clock div:2
  load:0x3fcd0108,len:0x132c
  load:0x403b6000,len:0xb80
  load:0x403ba000,len:0x2ea8
  entry 0x403b6238

Description

No response

Additional information

No response

@tannewt tannewt added this to the 8.0.0 milestone Jun 21, 2022
@tannewt tannewt added the crash label Jun 21, 2022
@RetiredWizard
Copy link
Author

I downloaded 7.3.1 from circuitpython.org and this still seems to occur on both the Devkit and FeatherS3 boards.

I built an 8.0.0 FeatherS3 CP from the master Github bits and the issue still occurs, however when I build 8.0.0 from the master for the Devkit the problem seems to have gone away.

@RetiredWizard
Copy link
Author

On a hunch, I threw a .env file on the FeatherS3 running 8.0.0 and the crash no longer occurs when pressing ctrl-D. So apparently adding the wifi SSID and password via the .env prevents this crash from occurring.

@RetiredWizard
Copy link
Author

The output of the debug console from the crash:

 I (30610) pp: pp rom version: e7ae62f
 I (30610) net80211: net80211 rom version: e7ae62f
 I (30620) wifi:wifi driver task: 3fce5928, prio:23, stack:3584, core=0
 I (45289) system_api: Base MAC address is not set
 I (45289) system_api: read default base MAC address from EFUSE
 I (45299) wifi:wifi firmware version: 84ed08c
 I (45299) wifi:wifi certification version: v7.0
 I (45309) wifi:config NVS flash: disabled
 I (45309) wifi:config nano formating: disabled
 I (45309) wifi:Init data frame dynamic rx buffer num: 8
 I (45319) wifi:Init management frame dynamic rx buffer num: 8
 I (45319) wifi:Init management short buffer num: 32
 I (45329) wifi:Init dynamic tx buffer num: 16
 I (45329) wifi:Init tx cache buffer num: 16
 I (45339) wifi:Init static tx FG buffer num: 2
 I (45339) wifi:Init static rx buffer size: 1600
 I (45339) wifi:Init static rx buffer num: 4
 I (45349) wifi:Init dynamic rx buffer num: 8
 I (45349) wifi_init: rx ba win: 6
 I (45359) wifi_init: tcpip mbox: 32
 I (45359) wifi_init: udp mbox: 6
 I (45359) wifi_init: tcp mbox: 6
 I (45369) wifi_init: tcp tx win: 2880
 I (45369) wifi_init: tcp rx win: 2880
 I (45379) wifi_init: tcp mss: 1440
 I (45379) wifi_init: WiFi IRAM OP enabled
 I (45389) wifi_init: WiFi RX IRAM OP enabled
 I (45389) phy_init: phy_version 503,13653eb,Jun  1 2022,17:47:08
 E (45399) phy_init: esp_phy_load_cal_data_from_nvs: NVS has not been initialized. Call nvs_flash_init before starting WiFi/BT.
 W (45409) phy_init: failed to load RF calibration data (0x1101), falling back to full calibration
 I (45449) wifi:mode : sta (f4:12:fa:41:f8:84)
 I (45449) wifi:enable tsf
 W (45449) CP wifi: sta start
 I (47899) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
 I (48609) wifi:state: init -> auth (b0)
 I (48609) wifi:state: auth -> assoc (0)
 I (48639) wifi:state: assoc -> run (10)
 W (48669) wifi:<ba-add>idx:0 (ifx:0, 48:5d:36:d1:30:e0), tid:0, ssn:0, winSize:64
 I (48699) wifi:connected with twilightzone, aid = 20, channel 1, BW20, bssid = 48:5d:36:d1:30:e0
 I (48699) wifi:security: WPA2-PSK, phy: bgn, rssi: -71
 I (48699) wifi:pm start, type: 1

 I (48699) wifi:set rx beacon pti, rx_bcn_pti: 14, bcn_timeout: 14, mt_pti: 25000, mt_time: 10000
 W (48709) CP wifi: connected
 I (48719) wifi:BcnInt:102400, DTIM:1
 I (50779) esp_netif_handlers: sta ip: 10.0.0.190, mask: 255.255.255.0, gw: 10.0.0.1
 W (50779) CP wifi: got ip
 I (55599) esp-x509-crt-bundle: Certificate validated

 >>>>>>>>>>>>   CTRL - D  Pressed Here    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

 I (61349) wifi:state: run -> init (0)
 I (61349) wifi:pm stop, total sleep time: lu us / lu us

 W (61349) wifi:<ba-del>idx
 I (61349) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
 W (61359) CP wifi: disconnected
 W (61359) CP wifi: reason 8 0x08
 W (61359) CP wifi: sta stop
 I (61409) wifi:flush txq
 I (61409) wifi:stop sw txq
 I (61409) wifi:lmac stop hw txq
 I (61409) wifi:Deinit lldesc rx mblock:4
 I (61419) gpio: GPIO[0]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0
 I (61419) gpio: GPIO[1]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0
 I (61429) gpio: GPIO[2]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0
                                                          ...
 I (61679) gpio: GPIO[48]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0
 E (61729) I2S: i2s_driver_uninstall(2006): I2S port 0 has not installed
 E (61729) I2S: i2s_driver_uninstall(2006): I2S port 1 has not installed
 Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

 Core  1 register dump:
 PC      : 0x4003f3c8  PS      : 0x00060e30  A0      : 0x820c93cc  A1      : 0x3fcf5040
 A2      : 0x3fcb09cc  A3      : 0x00000000  A4      : 0x00060e20  A5      : 0x00060c23
 A6      : 0x02ce16fc  A7      : 0x00ffffff  A8      : 0x00000007  A9      : 0x3fcf5000
 A10     : 0x3fcb09cc  A11     : 0x3fca3ed8  A12     : 0x00000004  A13     : 0x00060e23
 A14     : 0x00060e23  A15     : 0x00000001  SAR     : 0x00000009  EXCCAUSE: 0x0000001c
 EXCVADDR: 0x00000130  LBEG    : 0x40056f5c  LEND    : 0x40056f72  LCOUNT  : 0xffffffff


 Backtrace:0x4003f3c5:0x3fcf5040 |<-CORRUPTED


 ELF file SHA256: 720af6d11f903799

 CPU halted.

@RetiredWizard
Copy link
Author

RetiredWizard commented Jul 8, 2022

From Espressif api-guides:

LoadProhibited,

These CPU exceptions happen when an application attempts to read from an invalid memory location. The address which has been read is found in the EXCVADDR register in the register dump. If this address is close to zero, it usually means that the application has attempted to access a member of a structure, but the pointer to the structure is NULL.

@RetiredWizard
Copy link
Author

Okay, so after looping back and looking at the https._free_sockets() module I realized that the proper way for the function to be called would be to use the response.close() method and adding that statement does prevent the crash which is at least a better option than calling the private _free_sockets() method.

@jepler
Copy link

jepler commented Nov 30, 2022

I can't reproduce this, but I'm not sure if I followed the steps as you intended. wifi is set up via wifi workflow.

Adafruit CircuitPython 8.0.0-beta.4-30-g070bb35b07-dirty on 2022-11-28; Adafruit Feather ESP32-S3 Reverse TFT with ESP32S3
paste mode; Ctrl-C to cancel, Ctrl-D to finish
=== import ssl,wifi,socketpool
=== import adafruit_requests as requests
=== socket = socketpool.SocketPool(wifi.radio)
=== https = requests.Session(socket, ssl.create_default_context())
=== TEXT_URL = "https://finance.yahoo.com/quote/%5EIXIC"
=== response = https.get(TEXT_URL)
=== 
>>> print(next(iter(response.iter_content(chunk_size=32))))
b'<!DOCTYPE html><html data-color-'
>>> 

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.

I repeated the above a couplle of times and never got a hard fault.

@DJDevon3
Copy link

DJDevon3 commented Nov 30, 2022

I have a UM FeatherS3, QT Py S3, and Adafruit Feather S3. Is the devkit S3 specifically needed for this test or would one of the boards I have be ok? Also what version of CP should I test this with?

Is this a custom build that only OP can test since some are dirty?

@RetiredWizard
Copy link
Author

RetiredWizard commented Nov 30, 2022

@DJDevon3 You should be able to reproduce this on any of the S3 boards although I've only tested a few.

@jepler You do have to rename the .env file so the web workflow doesn't load. I've tested using both the old secrets file and manually typing the ssid/password into the wifi.radio.connect call on the following. (I can dig out more S3 boards if we need a bigger sample)

Adafruit CircuitPython 8.0.0-beta.4-35-gf637332de-dirty on 2022-11-18; FeatherS3 with ESP32S3
Adafruit CircuitPython 8.0.0-alpha.1-132-gdf1435ae9 on 2022-08-11; ESP32-S3-DevKitC-1-N8R2 with ESP32S3

The DevKitC got stuck in a boot loop somehow using the alpha.1 code so I reflashed with the latest nightly and also reproduced the crash to safe mode

Adafruit CircuitPython 8.0.0-beta.4-57-ge82a8bf8b on 2022-11-30; ESP32-S3-DevKitC-1-N8R2 with ESP32S3

@DJDevon3
Copy link

DJDevon3 commented Nov 30, 2022

Bug Confirmed

Likely related to 6791 first reported in August and by many others since then (keep an eye out for duplicate reports to merge)
This issue is present on USB/Serial not wifi workflow which is how it snuck past a lot of people.

import ssl
import wifi
import socketpool
import adafruit_requests as requests
from secrets import secrets
wifi.radio.connect(secrets["ssid"], secrets["password"])
socket = socketpool.SocketPool(wifi.radio)
https = requests.Session(socket, ssl.create_default_context())
TEXT_URL = "https://finance.yahoo.com/quote/%5EIXIC"
response = https.get(TEXT_URL)
print(response)
print("finished")

Adafruit CircuitPython 8.0.0-beta.4-58-gc8390a791 on 2022-11-30; Adafruit Feather ESP32S3 4MB Flash 2MB PSRAM with ESP32S3

<Response object at 0x3de0c450>
finished

Code done running.

Auto-reload is off.
Running in safe mode! Not running saved code.
(board unresponsive, reloaded serial terminal)

You are in safe mode because:
CircuitPython core code crashed hard. Whoops!
Crash into the HardFault_Handler.
Please file an issue with the contents of your CIRCUITPY drive at 
https://github.com/adafruit/circuitpython/issues

Press any key to enter the REPL. Use CTRL-D to reload.

Serial locks up, looks like some kind of hard reset bug. Hard reset works once. There's no errors just a freeze. Mu will save the file no problem but it doesn't trigger a reload. Ctrl+C/Ctrl+D unresponsive.

Adafruit CircuitPython 8.0.0-beta.4-58-gc8390a791 on 2022-11-30; Adafruit QT Py ESP32-S3 no psram with ESP32S3

Auto-reload is off.
Running in safe mode! Not running saved code.

You are in safe mode because:
CircuitPython core code crashed hard. Whoops!
Crash into the HardFault_Handler.
Please file an issue with the contents of your CIRCUITPY drive at 
https://github.com/adafruit/circuitpython/issues

Press any key to enter the REPL. Use CTRL-D to reload.

The QT Py S3 did one reload on save, didn't even get to the print statements and crashed. Board disconnected from USB and reloaded in safemode, hard fault crash. At one point, maybe in 7.3.3, I reported that the QT Py S3 wasn't affected by this bug. I just updated the Qt Py to latest nightly and it is now affected by it.

Adafruit CircuitPython 8.0.0-beta.4-58-gc8390a791 on 2022-11-30; FeatherS3 with ESP32S3

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
<Response object at 0x3d830170>
finished

Code done running.
--Board unresponsive, closed and reopened serial to this

You are in safe mode because:
CircuitPython core code crashed hard. Whoops!
Crash into the HardFault_Handler.
Please file an issue with the contents of your CIRCUITPY drive at 
https://github.com/adafruit/circuitpython/issues

Press any key to enter the REPL. Use CTRL-D to reload.

UM Feather S3 identical result as Adafruit Feather ESP32-S3. Disconnected/Reconnected USB, ctrl+c/ctrl+d unresponsive. Board frozen. Attempting to save file in Mu says it's saving but board seems unresponsive. Reloaded serial to find a hard fault message.

This happens (with the above script) on the 3 different S3's.
This is a particularly nasty one, perhaps the same one I've been chasing & reporting on my S3's for some time but I couldn't figure out the cause because the hard reset bug made getting to this one near impossible at the time. This one was hiding behind the hard reset bug. Now that the hard reset bug is out of the way this one is front and center affecting every S3 board. The best I could narrow it down to at the time was something to do with wifi,

I have not attempted to use response.close() which is a really nice workaround if true and perhaps we're all using the wrong methodology in using wifi if it's required? Still having an error message is better than a hard fault, hard fault is still a bug.

@jepler
Copy link

jepler commented Dec 1, 2022

I ran again with web workflow NOT enabled and confirmed the issue.

@jepler
Copy link

jepler commented Dec 1, 2022

I've traced the location of the crash into esp_tls_conn_destroy during gc_deinit, when finalisers of all objects are run. The underlying socket object has previously been closed, but that shouldn't matter.

void common_hal_ssl_sslsocket_close(ssl_sslsocket_obj_t *self) {
    common_hal_socketpool_socket_close(self->sock);
    esp_tls_conn_destroy(self->tls); // <-- inside this call, didn't trace further yet
    self->tls = NULL;
}

The pointers are ...

sslsocket close @0x3fcadf40 tls=0x3fcd5788

both seem to be inside the gc heap.

@jepler jepler self-assigned this Dec 2, 2022
jepler added a commit to jepler/circuitpython that referenced this issue Dec 2, 2022
A crash would occur if an SSL socket was not shut down before
`gc_deinit()`.

I do not fully understand the root cause, but some object deinitialization
/ deallocation prior to `gc_deinit` leaves the SSL object in an
inconsistent state.

Rather than resolve the root cause, instead ensure that the closing of
the user socket also closes the SSL socket.

Closes: adafruit#6502
@DJDevon3
Copy link

DJDevon3 commented Dec 2, 2022

This makes a lot of sense since others have reported it has a random memory leak like behavior and gc has no effect on it, even if gc is explicitly used manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants