-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CircuitPython 4.x hang on PyGamer after about 10 minutes of while loop playing stereo audio #2005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Powered on PyGamer today, left same code running for 2 mins, then copied a file off and that triggered a restart. That hung within about 10 mins too. Same serial output, the last line is only on the serial over USB not the PyGamer screen. My printed line of text per loop is a touch longer than the PyGamer screen width so each line wraps onto a new screen line. |
For completeness, I did one without anything attached to the feather connector. Second run hung after 20 mins. I tend to leave it running on my desk and glance at PyGamer or tty window occasionally to see if it has stopped scrolling. |
BTW, is there any kind of watchdog reset in CP? |
Nope, nothing that would cause that. There has been an issue of microcontroller fuses being set by accident that enable the watchdog timer, but that has a max of 32 seconds. |
I was thinking more of something that might save the situation and give a form of auto recovery. |
@kevinjwalters Do you have a jlink? We could get you one. Just email me. |
I got this to hang twice, but only after running for hours. Getting permission to post code here. Backtrace (4.1.0-rc.1): EDIT: Notice the bad pointer for
EDIT: Inside dma.c,
|
Another interesting tidbit would be the sercom registers which should be clocking the dma. |
|
The values "optimized out" will be presumably sitting in registers? What's the |
The values might have been in registers or sometimes on the stack, but at the point in the code where the values are no longer needed, the compiler will reuse those locations, so they are not preserved and would get overwritten for other uses. I'd have to look at the machine code to see if they're accidentally still available. But I can usually go back in the stack and see what was passed down. |
|
I'm wondering if #1992 could be related to this in some way? I'd imagine the circuitpython interpreter has little interest in the 16bit values and yet they have a profound effect on triggering that bug. Stranger still, I believe when it happens a "fix" is to append a single extra 16bit value. It's very visual on x-y oscilloscope output as the looping image starts perfect and then distorts as x-y go out of sync. They then periodically come back into sync for a moment and this repeats. |
I compiled with |
I'm going to watche/listen to the CircuitPython Deep Dive but where is the stack for the CP interpreter? Is there any guarantee that it can't clash with other users of the SRAM or detect if it does? I ask because the data buffer + length ends up in SRCADDR and that's 0x2002fd40 which is fairly close (704 bytes) to end of (M4 192k of) SRAM at 0x2002ffff. |
Yes, we have stack checking: https://github.com/adafruit/circuitpython/blob/master/py/stackctrl.c#L55. The VM interpreter does a check at opportune times, and there's a 1kB safety area. Most routines don't allocate large chunks of stuff on the stack. |
There's two variable length arrays (presumably on the stack?) at the end of this highlighted piece of code in I've not yet figured out exactly what the code is doing but Is it worth checking some I see there's a I think I'm also getting confused between the python stack and the C stack, I was more interested in latter. https://github.com/adafruit/circuitpython/blob/master/ports/atmel-samd/mpconfigport.h#L63-L64 suggests the C stack is at least 24k which sounds substantial for what it's used for and far larger than the 4096 SAMD21 equivalent which seems to work without problem. BTW, what's the |
One other observation, |
Thank you for auditing this code! @tannewt Take a look at these issues:
I set a breakpoint in the section of the code and am running your test program. There appears clipping going on, so it never actually hits this code.
I think we could just call
They are on the same stack, though there are compile options to use a separate stack or to use the heap (
That is a good point, and should be fixed. Again, in this particular case, it appears that code is not reached when your test program is running. |
Posting the test code: |
So, yes it looks like we should check we have enough stack left.
Yup, the volatile is debugging leftovers. I'll likely rework the begin_transaction stuff with the e-paper work I'm starting. As it is now, the first check causes a chip select blip that we don't need. |
@tannewt did you see any issues with the DMA descriptors? The above issues are interesting, but I don't think that code gets called anyway in this case, where it hangs waiting for the DMA to finish. |
I doubt the dma_descriptors are corrupted by the stack because they are on the other side of the heap. However, it's worth looking to see what is placed next to it in memory. It is weird that DRE is high on the Sercom since it should trigger a DMA burst to fill it. This could happen if the DMA wasn't ready when DRE was triggered I think. I wonder if it's related to the DMA issues @ladyada and @PaintYourDragon saw with the nintendo emulator and eyeball code. |
we think it only happens when you have two running DMA tasks. it was extremely sporadic and hard to repro so i patch-fixed it by detecting the 'DMA lockup' and kicking https://github.com/adafruit/nofrendo_arcada/blob/master/nofrendo_arcada.ino#L136 |
#1908 has also gone down the path of DMA investigation. |
Also getting this, when I have sound + display.
|
|
this could be related to the DMA bugs that @PaintYourDragon have bumped into |
Might be related to a couple things I’ve encountered with the Monster M4sk eyes in Arduino. If it’s the “multiple active DMA channels, one or more with linked descriptors” issue mentioned in the errata above: I don’t have a workaround for this other than change the design of the code if possible to avoid linked descriptors. Had to do this in the eyes, which un-did a small optimization there but oh well, at least it's solid now. A different problem I’ve encountered involves a DMA-transfer-complete callback occasionally and randomly not being invoked as it should. Workaround for this was knowing (very roughly) how long the DMA operation should take…and then if the code stalls for significantly longer than this in a particular spot, disabling and re-enabling the DMA channel. This CAN cause frames to drop but it gets back on track pretty quickly. If the latter sounds, familiar, code for the M4_Eyes project is here: |
I believe this was fixed by adafruit/samd-peripherals#29. Please feel free to re-open if it reproduces on current CircuitPython. (The fix was probably first in CircuitPython 5.0.0, btw) |
I've got a CircuitPython hang that I've seen on 4.1.0 rc0 and rc1 if I leave some code running. Takes about ten minutes on a PyGamer to hang, it's not predictable but will do it on about one in four executions. The serial connection (Windows 8.1 desktop) becomes disconnected but probably with no close (tapping a key in terminal window makes it realise the PyGamer's gone away) and the CIRCUITPY becomes inaccessible. I left the PyGamer screen illuminated and that's almost same printed line as I see on serial out. I say almost because I get one extra line on the USB serial rather than screen which surprises me a little. If I reset it doesn't do that safe mode thing. I tried 4.0.2 and it didn't seem to do it but because this occurs infrequently I can't say for sure whether 4.0.2 is ok or not.
The output to DAC stops changing too.
I've got GND/A0/A1 from Feather style connector on the back connected to a 'scope. I also have the standard small speaker attached but it's not enabled by the code. I wouldn't expect that to affect things.
I have not seen any
MemoryError
exceptions during development - seems unlikely memory is being exceeded and certainly no exception seen on output.I don't have a JLink debugger setup so am a bit limited on gathering more data on this.
Will supply code later.
The text was updated successfully, but these errors were encountered: