Skip to content

TG1WDT_SYS_RESET Randomly, No Guru Meditation Error #1033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Vincrl opened this issue Jan 25, 2018 · 8 comments
Closed

TG1WDT_SYS_RESET Randomly, No Guru Meditation Error #1033

Vincrl opened this issue Jan 25, 2018 · 8 comments

Comments

@Vincrl
Copy link

Vincrl commented Jan 25, 2018

Hardware:

Board: ESP32 Adafruit Feather Huzzah
Core Installation/update date: 25/jan/2018
IDE name: Arduino IDE
Flash Frequency: 80Mhz
Upload Speed: 921600

Description:

I have an embedded system that need to analyze a continual flow of UART information and transmit them on a fix frequency to a Database over TCP by using mbedTLS. To achieve that goal, I divided the Cores, so Core 0 takes care of WiFi transmission of data using mbedTLS of a JSON String. The Core 1 is simply reading package on the UART line and rights them in a buffer. The WiFi Task then reads these values every 10 seconds to transmit them if required by the server, it also transmits a smaller package if no informations are required by the server.

So here is the problem: After booting, everything setups just right. The ESP connects to the defined WiFi and it start its routine by transmitting. Then, randomly, (sometimes after 3 transmissions to server, other times 50) it reboots. but no giving any panic reasons or Guru Meditation (Verbose Option Active.) looking like this:

[I][HttpsHandler.cpp:25] process(): ----- Processing Https Request -----
[D][HttpsHandler.cpp:52] sendToDB(): Starting connection to server for update...
[V][ssl_client.cpp:48] start_ssl_client(): Free heap before TLS 107060
[V][ssl_client.cpp:50] start_ssl_client(): Starting socket
[V][ssl_client.cpp:86] start_ssl_client(): Seeding the random number generator
[V][ssl_client.cpp:95] start_ssl_client(): Setting up the SSL/TLS structure...
[V][ssl_client.cpp:108] start_ssl_client(): Loading CA cert
[V][ssl_client.cpp:143] start_ssl_client(): Setting hostname for TLS session...
[V][ssl_client.cpp:158] start_ssl_client(): Performing the SSL/TLS handshake...
[V][ssl_client.cpp:177] start_ssl_client(): Verifying peer X.509 certificate...
[V][ssl_client.cpp:186] start_ssl_client(): Certificate verified.
[V][ssl_client.cpp:201] start_ssl_client(): Free heap after TLS 65124
[D][HttpsHandler.cpp:58] sendToDB(): Connected to server!
[V][ssl_client.cpp:240] send_ssl_data(): Writing HTTP request...
[V][ssl_client.cpp:240] send_ssl_data(): Writing HTTP request...
[V][ssl_client.cpp:209] stop_ssl_socket(): Cleaning SSL connection.
[I][HttpsHandler.cpp:37] process(): ----- End of Https Request -----
[I][HttpsHandler.cpp:25] process(): ----- Processing Https Request -----
[D][HttpsHandler.cpp:52] sendToDB(): Starting connection to server for update...
[V][ssl_client.cpp:48] start_ssl_client(): Free heap before TLS 106520
[V][ssl_client.cpp:50] start_ssl_client(): Starting socket
[V][ssl_client.cpp:86] start_ssl_client(): Seeding the random number generator
ets Jun  8 2016 00:22:57

rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:956
load:0x40078000,len:0
load:0x40078000,len:13076
entry 0x40078a58
[I][wifiModule.ino:46] setup(): <--Booting WiFi Module-->

It does not always crashes at Seeding the random number, sometimes it crashes here:

[V][ssl_client.cpp:48] start_ssl_client(): Free heap before TLS 104556
[V][ssl_client.cpp:50] start_ssl_client(): Starting socket
[V][ssl_client.cpp:86] start_ssl_client(): Seeding the random number generator
[V][ssl_client.cpp:95] start_ssl_client(): Setting up the SSL/TLS structure...
[V][ssl_client.cpp:108] start_ssl_client(): Loading CA cert
[V][ssl_client.cpp:143] start_ssl_client(): Setting hostname for TLS session...
[V][ssl_client.cpp:158] start_ssl_client(): Performing the SSL/TLS handshake...
ets Jun  8 2016 00:22:57

rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee

I used to have a much more stable build. But since a few time it started acting out and I cant find the solution.

Sketch:

I am sadly not in the right to disclose the code at the moment but ill try to post to you guys a working example with the same crash.

What I tried:

  • Change Package size to smaller (Give more RAM)
  • Verify power is enough (2.1A 5V USB Source).
  • Increase HTTPS Requests Frequency (10s or 1min, same thing.)
  • Disable All Watchdogs in sdkconfig file:
	Line 209: CONFIG_INT_WDT=
	Line 210: CONFIG_INT_WDT_TIMEOUT_MS=300
	Line 211: CONFIG_INT_WDT_CHECK_CPU1=
	Line 212: CONFIG_TASK_WDT=
	Line 213: CONFIG_TASK_WDT_PANIC=
	Line 214: CONFIG_TASK_WDT_TIMEOUT_S=5
	Line 215: CONFIG_TASK_WDT_CHECK_IDLE_TASK_CPU0=
	Line 216: CONFIG_TASK_WDT_CHECK_IDLE_TASK_CPU1=

I am open to any solutions if you guys have some propositions. Thank You.

@stickbreaker
Copy link
Contributor

@Vincrl You definitely have a WatchDog timeout issue: from readtheDocs

By default, the task watchdog watches the idle tasks. The usual cause of idle tasks not feeding the watchdog is a higher-priority process looping without yielding to the lower-priority processes, and can be an indicator of badly-written code that spinloops on a peripheral or a task that is stuck in an infinite loop.

Other task can elect to be watched by the task watchdog by calling esp_task_wdt_feed(). Calling this routine for the first time will register the task to the task watchdog; calling it subsequent times will feed the watchdog. If a task does not want to be watched anymore (e.g. because it is finished and will call vTaskDelete() on itself), it needs to call esp_task_wdt_delete().

The task watchdog is built around the hardware watchdog in timer group 0. If this watchdog for some reason cannot execute the interrupt handler that prints the task data (e.g. because IRAM is overwritten by garbage or interrupts are disabled entirely) it will hard-reset the SOC.

I think the last sentence is what you are seeing: If this watchdog for some reason cannot execute the interrupt handler that prints the task data (e.g. because IRAM is overwritten by garbage or interrupts are disabled entirely) it will hard-reset the SOC.

Chuck.

@Vincrl
Copy link
Author

Vincrl commented Jan 25, 2018

Thanks for the quick response Chuck.

I will make sure the Watchdog is fed, but the weird part is that if I replace my Uart Core (This core has 1 task that always read, write and fill the vectors of value.) by a for(;;){} (So super CPU consuming loop doing nothing, not even feeding) or a for(;;){delay(1000)}, then the WiFi Core / Single Task is running perfectly without rebooting.

I realised that I am Writing in a vector from the Uart Core and Reading from that vector in the Wifi Core as follow:

WiFi Core <--- [Data Vector] <--- Uart Core.

And the Data Vector loops on itself (kind of a circular buffer). But I made no Semaphore or Mutex to protect the access to that data considering They can only be written by one task, and red by one different task. Could that cause the problem?

Thank You I hope I make sense in my sentences.

@stickbreaker
Copy link
Contributor

@Vincrl in your vector code (interrupt Vector?) are there any While() loops that could hang? an ISR needs to always be deterministic. It must not have any condition under which it waits for some other task or event to complete. It should be short and succinct.

Are you processing your UART data inside the interrupt code? You should just move the data to a buffer, mark the buffer ready to process and the foreground loop should do the actual processing. If you try to send data out the UART inside your receive UART interrupt code you can create a stall waiting for the bits to trickle out. The input ISR stalls waiting for the output UART to process, another input interrupt occurs interrupting the stalled ISR waiting the UART (Serial() object) which uses spinlocks to singlethread datablocks. explosion!

Have you elevated the priority of your code? If your code has a higher priority and is always ready, it will never allow a lower priority (idle) loop to execute. The WiFi code is piggy It has elevated priority and loops on conditions waiting for hardware events to occur.

Chuck.

@Vincrl
Copy link
Author

Vincrl commented Jan 26, 2018

@stickbreaker Thanks again for the quick answer.

Sorry, by Vector I mean C++ std::Vector , so on array in sort. Also my Uart isnt driven by ISR but run on its own Task alone on its core always polling the UartRX and Sending Whenever he needs to. I understand your answer but no task in my Uart Core blocks anything considering I can run it without any problem when removing any mbedtls (HTTPS Requests) code. So if my Esp is Uart communicating on one core, and answering to clients on the wifi task (Local Requests at his local address, so like a server) on the other core. Everything is alright. As soon as I add the Https Requests to a server, the reboot appear. Also, if the Uart core is deactivated or in a for(;;){} loop and the WiFi Core answers to clients in local and also do https requests to a server (with mbdetls), the device runs fine too. (Just ran for 20h).

So, by your logic, the mbedtls and Uart Tasks would share a ressource that interlock themselves thus provoking the WDT to trigger. Considering the only shared ressource they use from my code is the std::vector, this ressource should be the source of my WDT, but I do not protect that value with any spinlock, mutex or semaphore...

Also, shouldn't the WDT trigger a message like???:

Task watchdog got triggered. The following tasks did not feed the watchdog in time:
Tasks currently running:
CPU 0: WiFiTask
CPU 1: UartTask

I will try launching the system without accessing the std::vector from both side and only one at a time and keep you up to date. If this still triggers the WDT, that would mean a ressource not created by me is the cause of this Deadlock or Interlock or Eternal Wait.

Vince.

P.S. Really Appreciate the help by the way.

@stickbreaker
Copy link
Contributor

@Vincrl are you accessing the same Vector:: object from multiple tasks simultaneously? I do not know how thread safe standard objects are. I would assume they are not thread save. There are dedicated intertask communication procedures supported by the underlying operation system FreeRTOS I would recommend you read through the FreeRTOS documentation (8.2 is the current version included in the ESP32 environment).

Chuck.

@Vincrl
Copy link
Author

Vincrl commented Jan 26, 2018

@stickbreaker Seems like that's exactly what is happening, I removed the access to the vector from the https task and it seems have fix the issue. I still need to redo that part with a safe procedure tho, but I'm pretty sure that was the reason for my reboot.

Don't know why the system reboot without Guru Meditation error tho, or by saying the reset was due to WDT. Still gotta understand that part but I believe intercore variable access securities and protocole are needed here and will lead me to a solution.

Gonna confirm you that it fixes the issue.

Thanks Chuck.

@stickbreaker
Copy link
Contributor

@Vincrl Sound like you are on the path to success. Good luck.

Chuck.

@Vincrl Vincrl closed this as completed Jan 26, 2018
@ssilverman
Copy link

In the interests of more information, I'll add that I was seeing this until I increased the size of the task stack size. That seemed to fix the problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants