Skip to content

Application exception output to serial (over USB) appears truncated due to buffering issue #1639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kevinjwalters opened this issue Mar 12, 2019 · 39 comments
Assignees
Milestone

Comments

@kevinjwalters
Copy link

My near-perfect code has just spat an exception. I'm running 4.0.0-beta.3-4-g186e31591-dirty which 4.0.0 beta 3 with a minor tweak. The exception has half-appeared on my console (PuTTY on Windows) from the CPX over USB:

Press any key to enter the REPL. Use CTRL-D to reload.soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Shared couter PWM failure I - trying in reverse order
Shared couter PWM failure II - soft/hard reset suggested
Ready to play
Traceback (most recent call last):
  File "code.py", line

There's a single space after line - the absence of the rest of the text on the line makes it very apparent it's truncated. When I press the enter key and the rest appears leaving it looking like this:

Press any key to enter the REPL. Use CTRL-D to reload.soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Shared couter PWM failure I - trying in reverse order
Shared couter PWM failure II - soft/hard reset suggested
Ready to play
Traceback (most recent call last):
  File "code.py", line 199, in <module>
TypeError: list indices must be integers, not tuple



Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 4.0.0-beta.3-4-g186e31591-dirty on 2019-03-11; Adafruit CircuitPlayground Express with samd21g18

I've never noticed that before and I use this setup a lot. It appears fully reproducible for this particular code/error combo. Are there differences in the way 4.0.0 vs previous versions writes to serial or how it reports exceptions?

@kevinjwalters kevinjwalters changed the title Exception output to serial (over USB) appears truncated due to buffering issue Application exception output to serial (over USB) appears truncated due to buffering issue Mar 12, 2019
@tannewt tannewt added this to the 4.0.0 - Bluetooth milestone Mar 12, 2019
@tannewt
Copy link
Member

tannewt commented Mar 12, 2019

4.x uses a different USB stack than 3.x so it's likely a bug in TinyUSB. Maybe @hathach will take a look.

@kevinjwalters
Copy link
Author

For what it's worth, just noticed this occurs with 4.0.0 beta 5 too. Only about 1 in 20 of my errors does this, there's something data (or timing?) dependent about this.

@notro
Copy link

notro commented Mar 20, 2019

It's probably the same problem I have run into #1342 (comment)

@kevinjwalters
Copy link
Author

FYI, In my case the data will pop out if you tap return (i.e. send some data in oppposite direction). The terminal session is not disconnected.

@hathach hathach self-assigned this Mar 21, 2019
@tannewt
Copy link
Member

tannewt commented Mar 21, 2019

I wasn't able to reproduce this. @kevinjwalters can you please post everything needed to reproduce this? Thanks!

I tried this code because it prints long strings:

import time

for i in range(100):
    print("h"*i)

for x in range(4):
    print("wowo")


print("hellow")

print("woowza")

print("blef")

time.sleep(0.1)


for i in range(100, 500):
    print(("h",)*i)

@kevinjwalters
Copy link
Author

I've only noticed it on certain exception output, not to say it doesn't occur on others, of course.

My first example got zapped when I encountered the #1643 bug :( I may have preserved another, let me go digging.

@kevinjwalters
Copy link
Author

This did it on the 3rd attempt. Each time I was using up arrow/return to recall and execute the previous command in PuTTY. It's not fully reproducible though:

>>> raise AttributeError("abcdefghij" * 12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij
>>> raise AttributeError("abcdefghij" * 12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij
>>> raise AttributeError("abcdefghij" * 12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 21, 2019

Here's one showing where it stops (until any key (tried return, tried c) is pressed which then nudges the rest of it out):

>>> raise AttributeError("abcdefghij" * 12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError

It's also possible this is triggered by some very subtle behaviour across USB to terminal...

@kevinjwalters
Copy link
Author

I'd suggest leaving this until I can get a more reproducible case.

@tannewt tannewt modified the milestones: 4.0.0 - Bluetooth, Long term Mar 26, 2019
@tannewt
Copy link
Member

tannewt commented Mar 26, 2019

Ok, moved to Long Term and will bump it back up when we can reproduce it.

@kevinjwalters
Copy link
Author

kevinjwalters commented Apr 17, 2019

Not exhaustive but I've tried and so far failed to reproduce this with a brief bit of fiddling on (on 4.0.0 beta 7):

  • variable length text in a ValueError exception after a few random seconds
  • same exception as in ticket by using a tuple for list index at line 199

One other thing that's struck me is the code I was running was using USB MIDI so this could be related in some way to other traffic flowing over USB to the CPX. That might have some triggering effect here?

@kevinjwalters
Copy link
Author

kevinjwalters commented Apr 17, 2019

Acutally I just re-ran one of my examples from comments on 4.0.0 beta 7 and it did it on the second raise on REPL:

>>> raise AttributeError("abcdefghij" * 12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij
>>> raise AttributeError("abcdefghij" * 12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError

and if i press a key, tried capital Z (I normally just tap enter), it all pops out:

>>> raise AttributeError("abcdefghij" * 12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij
>>> Z

@kevinjwalters
Copy link
Author

kevinjwalters commented Apr 17, 2019

On another CPX I got 3 failures out of 20 when cutting and pasting that exception and manually hitting (PuTTY on Windows, 115200 8n1 xon/xoff - disabling xon/xoff makes no difference).

Pasting the problematic raise line complete with the newline also triggers it at same frequency.

@kevinjwalters
Copy link
Author

kevinjwalters commented Apr 19, 2019

@tannewt @hathach In light of other, possibly sporadic issues relating to USB on boards running CircuitPython (#1749 #1782), perhaps this is worth digging into more? If there's a general USB comms problem where data is very occasionally not be sent when it is expected that could break a lot of protocols in mysterious ways.

Can you ask a few of the usual suspects to cut and paste that raise line into REPL 15 times carefully to see if they can reproduce this ideally on Windows? They need to check each time visually that output has popped out in full before pasting the next one in.

@kevinjwalters
Copy link
Author

I just tested an GEMMA M0 on 3.1.2 and that was perfect for 20 executions of raise AttributeError("abcdefghij" * 12) at REPL. The first execution of that after upgrade to 4.0.0 rc1 did the stuck buffer thing:

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.

Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 4.0.0-rc.1 on 2019-04-23; Adafruit Gemma M0 with samd21e18
>>>
>>>
>>> raise AttributeError("abcdefghij" * 12)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError

I did 20 of those, 12 of them got stuck, an unusually high number. I do also have 4 other CPX boards plugged in at the moment which is atypical. I am not using any USB hubs, btw.

@kevinjwalters
Copy link
Author

Gemma M0 exhibits the problem with that REPL raise with CPX boards not plugged in too, just tried it.

@kevinjwalters
Copy link
Author

raise AttributeError("abcdefghij" * 12) on my second attempt on REPL on new PyPortal (4.0.0-rc.3) reproduced bug, same place as the previous examples.

@kevinjwalters
Copy link
Author

kevinjwalters commented May 20, 2019

I've got a wireshark capture of this if that's of any use? No great revelations based on my inexpert (at USB) glance. When it behaves normally it shows a 208 byte response which I think is 27 header + 181 data. When it doesn't work there's an 80 byte response and then a 157 when user hits a key, as I hit enter an extra CRLF is appended explaining the 183 = 80-27+157-27.

Two other observerations, the exception appears in full on the PyPortal's screen. I also reproduced it a second time and left it for 20 minutes and it still didn't pop out so it seems to be likley it's in tx buffer and will not be released by a timeout, just receiving data.

@dhalbert
Copy link
Collaborator

dhalbert commented May 20, 2019

Diagnosing:

supervisor/shared/serial.c calls tu_cdc_n_write(), which calls tu_fifo_write_n().

Note that in ib/tinyusb/src/class/cdc/cdc_device.c, there's this commented-out code:

#if 0 // TODO issue with circuitpython's REPL
  // flush if queue more than endpoint size
  if ( tu_fifo_count(&_cdcd_itf[itf].tx_ff) >= CFG_TUD_CDC_EPSIZE )
  {
    tud_cdc_n_write_flush(itf);
  }

CPy should probably wait to write more characters until there's room in the fifo, but maybe tinyusb needs an API call to return how much space is available in the fifo.

@dhalbert dhalbert modified the milestones: Long term, 4.x.x - Bug fixes May 20, 2019
@kevinjwalters
Copy link
Author

kevinjwalters commented May 21, 2019

In addition to observations on Discord about locking not extending to all the read operations on the values in the fifo that can change (perhaps relying on atomic locks and/or typical style of use).

I see another very minor issue which will probably never get exercised and might well be harmelss even if it was, there's a drop in int precision for bufsize, comes in unsigned 32bit but then used 16bit:

$ fgrep -B 2 tu_fifo_write_n ./lib/tinyusb/src/class/cdc/cdc_device.c
uint32_t tud_cdc_n_write(uint8_t itf, void const* buffer, uint32_t bufsize)
{
  uint16_t ret = tu_fifo_write_n(&_cdcd_itf[itf].tx_ff, buffer, bufsize);
$ fgrep -h tu_fifo_write_n ./lib/tinyusb/src/common/tusb_fifo.{h,c}
uint16_t tu_fifo_write_n (tu_fifo_t* f, void const * p_data, uint16_t count);
uint16_t tu_fifo_write_n (tu_fifo_t* f, const void * p_data, uint16_t count)

On the overwrite theme, that feature is set to false for CDC:

$ fgrep -r tu_fifo_config ./lib/tinyusb/src/class/*/*.c
./lib/tinyusb/src/class/cdc/cdc_device.c:    tu_fifo_config(&p_cdc->rx_ff, p_cdc->rx_ff_buf, CFG_TUD_CDC_RX_BUFSIZE, 1, true);
./lib/tinyusb/src/class/cdc/cdc_device.c:    tu_fifo_config(&p_cdc->tx_ff, p_cdc->tx_ff_buf, CFG_TUD_CDC_TX_BUFSIZE, 1, false);
./lib/tinyusb/src/class/cdc/cdc_device.c:    tu_fifo_config_mutex(&p_cdc->rx_ff, osal_mutex_create(&p_cdc->rx_ff_mutex));
./lib/tinyusb/src/class/cdc/cdc_device.c:    tu_fifo_config_mutex(&p_cdc->tx_ff, osal_mutex_create(&p_cdc->tx_ff_mutex));
./lib/tinyusb/src/class/midi/midi_device.c:    tu_fifo_config(&midi->rx_ff, midi->rx_ff_buf, CFG_TUD_MIDI_RX_BUFSIZE, 1, true);
./lib/tinyusb/src/class/midi/midi_device.c:    tu_fifo_config(&midi->tx_ff, midi->tx_ff_buf, CFG_TUD_MIDI_TX_BUFSIZE, 1, true);
./lib/tinyusb/src/class/midi/midi_device.c:    tu_fifo_config_mutex(&midi->rx_ff, osal_mutex_create(&midi->rx_ff_mutex));
./lib/tinyusb/src/class/midi/midi_device.c:    tu_fifo_config_mutex(&midi->tx_ff, osal_mutex_create(&midi->tx_ff_mutex));

From https://github.com/adafruit/circuitpython/blob/master/ports/atmel-samd/Makefile#L88 it looks like M0 boards set that output buffer to be 128 and M4 to 256, the latter is big enough to hold the output I can trigger this bug with on PyPortal.

My original interest was largely inspired by understanding if the same phenomena would apply to file system operations over USB as those could be more critical. It's looking like the code in question is all very specific to serial writes?

@hathach
Copy link
Member

hathach commented May 21, 2019

thank you very much for putting effort to look at the issue. The cdc fifo read/write doesn't need mutex protection, since all cdc_read/write are all invoked in main's thread, never within ISR. The mcu never switch the task while calling those API. Mutex is only needed in RTOS environment where preemption is possible.

indeed the uint32_t cast to uint16_t is inconsistent but it shouldn't be an issue since the fifo size is less likely larger than 64KB ( 128/256 in this case).

I am in the middle of usb work for Arduino, I will try to reproduce and fix this issue when possible :)

@kevinjwalters
Copy link
Author

kevinjwalters commented May 21, 2019

If this only affects serial writes then I would say it's not overly important to address quickly.

I find it easy to reproduce on REPL and if there's a good way to debug a board where you can see what function is being used that would give away what's going on. For REPL if you keep trying to reproduce it then the frequency of occurrence dramatically drops, it's almost as if it goes into hiding!

@hathach
Copy link
Member

hathach commented May 21, 2019

thanks @kevinjwalters , we will still need to look at it to see if we could fix it easily enough.

@kevinjwalters
Copy link
Author

Just noticed this phenomena on a PyGamer running 4.0.2

@kevinjwalters
Copy link
Author

Also happens on the Circuit Playground Bluefruit running 5.0.0 beta 0

@hathach
Copy link
Member

hathach commented Apr 16, 2020

I could reproduce the issue with detail instruction from @DavePutz, will try to troubleshoot it and post update here.

This should return
 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError:
abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij

When it fails it only returns
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError

The rest of the output only prints after you press any key.

@kevinjwalters
Copy link
Author

@hathach That's good news. I've seen this happen on a CLUE board running either 5.0.0 or 5.1.0 recently but only a few times.

@kevinjwalters
Copy link
Author

BTW, there's another issue with problems writing to the console. It doesn't sound like it's directly related to this and may well be outside of tiny usb component but worth having a quick look at #2686

@hathach
Copy link
Member

hathach commented Apr 16, 2020

BTW, there's another issue with problems writing to the console. It doesn't sound like it's directly related to this and may well be outside of tiny usb component but worth having a quick look at #2686

Thanks for the link, it seems to be a different bug. Let's just fix this one first to see that could fix the other as well.

@hathach
Copy link
Member

hathach commented Apr 17, 2020

@kevinjwalters @DavePutz can you download 2 file of cdc driver below replace (after removed the .txt) to lib/tinyusb/src/class/cdc . I tested here and it seems to be fine. Once confirmed I will make an PR to update tinyusb submodule. It may take a bit of time since there are ~500 commits comparing with current one used by cpy. Though we need to update anyway for the esp32-s2 support.

cdc_device.c.txt
cdc_device.h.txt

@DavePutz
Copy link
Collaborator

DavePutz commented Apr 17, 2020 via email

@tannewt
Copy link
Member

tannewt commented Apr 17, 2020

@hathach I'm starting to look at ESP32-S2 today again and will likely do the TinyUSB update as part of that.

@hathach
Copy link
Member

hathach commented Apr 17, 2020

@hathach I'm starting to look at ESP32-S2 today again and will likely do the TinyUSB update as part of that.

That is great, don't forget you also need to define the USBIRQ and call the tud_init_handler() on EVERY ports as well

@hathach
Copy link
Member

hathach commented Apr 17, 2020

Hi, The fix looks good to me. I ran 100 tests on a CircuitPlayground Express with no errors. Thanks! -Dave

I am glad that fix the issue, if anyone is curious about the bug, It is fixed by PR by majbthrd in this PR to usb stack here hathach/tinyusb#260

@kevinjwalters
Copy link
Author

@hathach Thanks for the fix. It'll take me a while to look at this. I've never compiled up CircuitPython and I've got a feeling it requires a few later version of things, like gcc, than I currently have.

@hathach
Copy link
Member

hathach commented Apr 17, 2020

@hathach Thanks for the fix. It'll take me a while to look at this. I've never compiled up CircuitPython and I've got a feeling it requires a few later version of things, like gcc, than I currently have.

Which board you are testing with, I can upload the uf2

@kevinjwalters
Copy link
Author

I'm mainly using the Adafruit CLUE at the moment and I've seen it twice while fiddling around on REPL, I also have Circuit Playground Bluefruit and Circuit Playground Express which I used fairly frequently. I can test on any of them.

@hathach
Copy link
Member

hathach commented Apr 17, 2020

@kevinjwalters Here you, I uploaded all 3 uf2 you mentioned

clue-firmware.uf2.txt
cpb-firmware.uf2.txt
cpx-firmware.uf2.txt

@tannewt tannewt modified the milestones: 5.x.x - Bug Fixes, Long term Jul 13, 2020
@tannewt
Copy link
Member

tannewt commented Aug 11, 2021

This was fixed by hathach/tinyusb#260

@tannewt tannewt closed this as completed Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants