-
Notifications
You must be signed in to change notification settings - Fork 23
TimeOut recovery #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@stickbreaker You may be waiting awhile on this one. I got the reset that I provided on line 498 of the code (https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/esp32-hal-i2c.c) directly from the Espressif folks as this is the only known way with the current chip to recover the I2C. |
@lonerzzz , Does this reset work for you? When I try it with my above error case, it does not clear the error. I have to manually touch reset(en) The only way I have been able to recover is by wiring two more pins through diodes and bit banging a Chuck. |
The reset does work on the ESP32 side but that does not work for all the situations because there is a class of errors that require more than a state machine reset when the slaves hold lines down. One thing that could be done would be to have the reset disable the HW I2C and its connection to the the SDA and SCL pins, bit bang the clock, reattach the SDA and SCL to the pins and restore the HW I2C. I too had the extra connections to the SDA and SCL pins but that is not needed since the pins can be general GPIO. |
@lonerzzz yea, I think the I'm thinking about a revision that would reduce the entry points to Chuck. |
Sorry, just not getting updates, not even in my spam folder. Here is my thoughts. The use of a handle at the HAL layer would be useful to allow concurrent I2C channels especially since the ESP32 supports multiple I2C channels. I wouldn't go so far as to have multiple tasks feeding the queues though. That could get complicated for debugging and support. To me having the independent handles provides the best balance. The application author can set up tasks each owning one of the I2C handles fairly easily. |
@lonerzzz Thanks, I'll think through that. Another Timeout Testing question: I am trying to understand the StateMachine, and how it responds to the timeout. With the results you have see, there is two possible exit paths from the timeout: I wonder if you could change your Wire.requestFrom(ID,1,false);
Wire.requestFrom(ID,1); This would create a command[] list of: It will create a different Interrupt Pattern if the Timeout Recover continues through the Command[] list. Currently the Timeout Recovery never pass I2C_ERROR_TIMEOUT back to the APP. Chuck. |
@stickbreaker Here is what the specified sequence of requestFrom calls dumps out. The sequence repeats. [E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x689ac, ed=0x6917c, =2000, max=2000 error=1 [E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x693d1, ed=0x69ba1, =2000, max=2000 error=1 [E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6a07f, ed=0x6a84f, =2000, max=2000 error=1 [E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6a8d8, ed=0x6b0a8, =2000, max=2000 error=1 [E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6b109, ed=0x6b8d9, =2000, max=2000 error=1 [E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6bb2f, ed=0x6c2ff, =2000, max=2000 error=1 [E][esp32-hal-i2c.c:1130] i2cProcQueue(): Gross Timeout Dead st=0x6c7dc, ed=0x6cfac, =2000, max=2000 error=1 |
@lonerzzz Thanks for the data. It looks like your sensor really hangs the I2C bus for a LONG time. I don't think any recovery attempt would be correct. The delay is unavoidable. Would you want to add some other mech to monitor this pause? count=Wire.requestFrom(0x0d,1);
if(Wire.lastError()==I2C_ERROR_TIMEOUT){
while(!digitalRead(SCL)); //hang until the slave releases SCL
count=Wire.requestFrom(0x0d,1); // re-issue the READ that Timed-Out
}
while(Wire.available()){
// do something with the data
} If you increase the timeout to this 10second range, it will probably act different. But every time it times out it shuts off the I2C SM with ctr.trans_start=0; Increasing the timeout with Wire.setTimeOut(10000);
count=Wire.requestFrom(0x0d,1); // the 10second wait
if(count==1) {
// do something with the data
}
else {
// something bad happened
} These dumps don't show what I was hoping. There was no Timeout recovery. The SM just paused. The Bus was nolonger busy when the successful READ occurred at 0x6c7dc. Though, this does show the SM did recover from the SLAVE stretching SCL, but Transactions was lost/aborted while the SLAVE controlled SCL. Should I change the code to return I2C_ERROR_BUSY if status_reg.bus_busy is asserted when the timeout expires? My sample code would change to this: Wire.setTimeOut(2000);
count=Wire.requestFrom(0x0d,1);
if(Wire.lastError()==I2C_ERROR_BUSY){
while(!digitalRead(SCL)); //hang until the slave releases SCL
count=Wire.requestFrom(0x0d,1); // re-issue the READ that Timed-Out
}
while(Wire.available()){
// do something with the data
} Chuck. |
@stickbreaker Sorry if there is confusion here. The sensor isn't holding for the duration that you see in the messages. It is getting messed up by the test sequence that you wanted to see so this includes reset and recovery times as well after getting messed up. Under normal operation it responds in under 2 seconds and recovers in less than 4 seconds. As well, the sensor has a polling mechanism that is the other method of operation which is what I use normally to avoid monopolizing the bus as there are multiple sensors on the same bus. As for the response codes, returning busy in preference to the timeout does make sense to me because it can indicate something more relevant than the timeout if the SM is in an unexpected state. |
@lonerzzz Ok,
Chuck. |
@stickbreaker That is what I was thinking so that we don't mask a state machine error. |
@lonerzzz ok, I'll do that. the test condition will be the movement of data through the statemachine. If nothing moved it is Bus_Busy, else just a TimeOut. Chuck. |
Late to the party here, but I just started running into I2C problems when writing to a 128x32 OLED display. In a simple execution environment all is well, but once I have several FreeRTOS tasks running I occasionally run into problems. It starts with an i2cWrite(): Ack Error! followed by an endless stream of i2cWrite(): Busy Timeouts. So far the only way I have found to recover is with a hardware reset. My reading of the discussion above is that most of the issues being worked on are focused on reading I2C data, but I thought I'd add a comment and see what thoughts you two may have on this issue. I know nothing about the inner workings of the I2C protocol and I'll likely first try switching to an SPI-based display before digging into the details of I2C. Plus at the moment I've got no way to reliably reproduce the problem except by waiting for something to eventually go wrong. |
@tferrin Are you using the main arduino-esp32 or my version? Base on you description Try using my Release V0.1.2 Master Mode operations (reading/writing) are similar, the differences do not contribute anything to the TimeOut problems. Most I2C operations compose both Write and Read operations. Chuck. |
I'll be trying out your V0.1.2 release this evening. I had already decided to do so after my earlier post, but had other obligations that needed attention. I do have WiFI running and also a high-priority task that reads data from a network of sensors via a low-power radio link. So there is ample opportunity for task preemption. Will post up my results. |
Also, just read this... So I need to review how I've done my task allocation. |
Good news to report using the V0.1.2 code. The Busy Timeouts are gone, replaced by TimeoutRecovery's. Now the display never stops responding and output is never garbled. (See attached log output.) --tom |
I've edited my previous comment to remove the reference to exceptions happening with the V0.1.2 code. I caused those by making a stupid programming error. |
Things are even better today! Turns out all the Timeout Recovery's noted in my post above were being caused by the interaction of my i2c display and the OneWire library, no doubt because of the portENTER/EXIT_CRITICAL calls it uses. By using a mutex semaphore around writes to the display and reading from the OneWire temperature probes I eliminated all the remaining i2c problems I was having. Thanks, @stickbreaker, for the great improvements you made to OneWire and esp32-hal-i2c.c! @me-no-dev, these changes really need to get incorporated into the main repository. I'm pretty convinced that the minor incompatibility caused with the new API will hardly be noticed. In all the device libraries I keep on my system grep not find any that use writeTransaction(...,false) or requestFrom(...,false). Not saying they don't exist, but I suspect any incompatibility downside will be far outweighed by the improved stability gains. |
Adding a reference to a related thread on the ESP-IDF site: espressif/esp-idf#1503 |
I think I have a fix for this! Chuck. |
stickbreaker, thanks for your work on this I2C issue! I'm not sure if this is the best place to post but the main "trunk" of this repository fixed the I2C issues I was experiencing. I have a custom board (based on Adafruit ESP32 feather) with around 12 I2C devices communicating with various extenders and I2C multiplexers and your code really seemed to fix the stalls in the I2C communication I was experiencing. Good Work!! Hopefully your modifications can get incorporated into the main espressif ESP32 repo. Thanks again! |
@thaanstad Do you have a list of i2c devices you are using? |
@thaanstad I'm pleased it is working for you. Do you use the original I haven't receive any feedback from people using these block transmission function. I'm curious if anyone uses them, and if they think they should be included in the main branch when this is merged. I added them because I wanted to test the hardware capacities. I use 24LC512 as a simple FAT file system. The Arduino 32byte buffers were a pain to work around. Chuck. |
When I click on that link github returns 404 Page Not Found |
@tferrin I merged it into my main branch, it is no longer a separate branch. Chuck. |
The main code in the repo: https://github.com/espressif/arduino-esp32 as of the morning of 3/15/2018 was having problems with I2C communication with I2C chips on my custom boards (based on Adafruit ESP32 feather). The I2C communications would run fine for 30 secs - 2 min or so and then stop for no reason. The main code would continue to run and I would just see garbage data from I2C sensors or the I2C controlled IO control would stop. I took a look at the clock voltage during one of these hangs and did not see any changes from 3.3V which I assume indicates the clock stopped. I think there was some error in the I2C communication and the ESP32 would stop communication and would not resume. After updating to stickbreaker's repo the communications have been working without interruption in excess of 1hr (and still going). I have seen 3 I2C miscommunications that showed false data from some of the sensors but this did not stop the I2C communications and the USB Serial data printout continued as normal with good data after the one misread. When there were hangups before the MCU would continuously spit out garbage data (shown via USB Serial) and I assume was not re-engaging I2C comms after a hangup or miscommunication. I don't remember if the data was the same or not... (all 0x00 or 0xFF). FYI there were no hardware changes during the testing of these repos and I haven't touched the circuitry (or connection wires) during the I2C communications evaluation. After reading on the essprissif/ESP32 repo issues about the ESP32 MCU hanging during I2C communications I assumed this was the problem with my circuit and have only done modifications in software to try to resolve the problem. I am sure the improvement in I2C communications are due to the modifications in stickbreaker's repo and not a hardware change. chinswain: stickbreaker: |
@thaanstad My current main branch is the most up to date version. No one has 'yet' started yelling at me that is breaks their stuff. I recommend you use it (main branch is currently at release V0.2.0). Have you been bitten by my I also use MCP23017's to interface to HD44780 LCDs and HD66717 LCDs. I have found using Non-Sequential mode allows fast updates. I have my LCD's wired with 4bits of PortA for the CTRL signals and all 8bits of PortB as data. I then can sends a continuous stream of data to fill the 4x20 one line at a time. Each of the 20 display characters takes 4 bytes of I2C data. [data,ctr(write,EN),data,ctrl(write,!EN)]. I have been able to update my 4x20 lcd at over 2,000 cps. It just gray's out the screen. I like those MCP23008, MCP23017's. Chuck. |
This is great Chuck. On the latest sync to ardunio core and using an earlier version of your 4 files I had started getting the timing issues occurring again. Today I found the discussion between you and ESP32DE that fixed that by setting the SDA and SCl pins high before setting the pin mode. So I put in your change to my environment and it failed with the same timing issues. I thought I would change the pin mode to output_open_drain and all the issues went away. I can repro this easily in my environment. I then pulled down your latest changes from your master branch and it all works now fine even though you are using open_drain for the pin mode. So I am not sure why , but I though I would let you know. For reference I am attaching the 4 earlier files I have that work with the output_open_drain but not with open_drain. Pretty weird so say the least! But you put so much effort into this I thought I would let you know. |
@ifrew Thanks for the accolades. The OPEN_DRAIN is required because the i2c bus is electrically an open_drain bus. When the peripheral is attached, it re-configures the output drivers on the GPIO's behind the scenes. Chuck. |
I'm going to close this issue, I believe the timeout cascade has been solved. |
TimeOut, Arbitration, and Bus Busy errors cause by hardware faults are difficult to recover from.
The current
Wire.reset()
cannot successfully reset the hardware.This can be tested by initing the
Wire
library withWire.begin()
then grounding SDA. This is simulate aSTART
->STOP
, it is considered a void statement by the I2C standard and not recommended. But, reality cannot be ignored, it happens.The bus will be clear(SCL and SDA high with no activity), but the SM will fall into an irrecoverable state of BUS_BUSY, TIMEOUT perhaps ARBITRATION.
The only successful recovery is a HARDWARE reset operation. Or, if a second set of GPIO pins are dedicated, a manual bit banged
START
9 clocksSTOP
can recover the SM.Does anyone have a successful Software Recovery Method?
Chuck.
The text was updated successfully, but these errors were encountered: