Skip to content

file system issues on pca10059 #1654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jerryneedell opened this issue Mar 17, 2019 · 33 comments · Fixed by #1661
Closed

file system issues on pca10059 #1654

jerryneedell opened this issue Mar 17, 2019 · 33 comments · Fixed by #1661
Assignees

Comments

@jerryneedell
Copy link
Collaborator

jerryneedell commented Mar 17, 2019

following up on #1649
running this on pca10059 -- test code from @uhrheber


Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 4.0.0-beta.4-2-ga10e4fe21 on 2019-03-17; PCA10059 nRF52840 Dongle with nRF52840
>>> 
>>> 
from adafruit_ble.uart import UARTServer
import board, digitalio, pulseio
from adafruit_bluefruit_connect.packet import Packet
from adafruit_bluefruit_connect.color_packet import ColorPacket

ledr = pulseio.PWMOut(board.LED2_R, frequency=5000, duty_cycle=65535)
ledg = pulseio.PWMOut(board.LED2_G, frequency=5000, duty_cycle=65535)
ledb = pulseio.PWMOut(board.LED2_B, frequency=5000, duty_cycle=65535)

ledy = digitalio.DigitalInOut(board.LED1)
ledy.direction = digitalio.Direction.OUTPUT
ledy.value = True

uart_server = UARTServer()

def rgbled(colors):
    ledr.duty_cycle = (255 - colors[0]) << 8
    ledg.duty_cycle = (255 - colors[1]) << 8
    ledb.duty_cycle = (255 - colors[2]) << 8

while True:
    # Advertise when not connected.
    uart_server.start_advertising()
    ledy.value = False
    while not uart_server.connected:
        pass

    while uart_server.connected:
        ledy.value = True
        packet = Packet.from_stream(uart_server)
        if isinstance(packet, ColorPacket):
            print(packet.color) 
            rgbled(packet.color) 

so far - no issues

@jerryneedell
Copy link
Collaborator Author

hmm -- BUT -- I am finding that sometimes files copied to the PCA10059 doe not show up to the PAC10059 FS.... Recopying them sometimes works -- I have not seen the FS wiped ... yet.
I'll try this on the other 52840 boards

@uhrheber
Copy link

@jerryneedell Please try to copy data to the drive, while the code runs.
Ideally, use F3 (http://oss.digirati.com.br/f3/) or h2testw (https://fightflashfraud.wordpress.com/2008/11/24/h2testw-gold-standard-in-detecting-fake-capacity-flash/).

Both are meant to write random test data to USB sticks, read it back and test for modified/overwritten parts.
I get modified data when writing to the drive while the code runs.

@jerryneedell
Copy link
Collaborator Author

jerryneedell commented Mar 17, 2019

OK -- I will try to replicate your test setup.
As noted above, I have been copy files to it and there are clearly issues on the pca10059. I have not been able to replicate them on the other boards yet.

It may take awhile for me to read up on f3 and be brave enough to execute it ;-)

@dhalbert or @tannewt can you comment on whether or not you expect CP to be able to handle the kind of stress tests that this is performing.

For now I'll stick to simple file copies.

@uhrheber
Copy link

F3 just uses a single task to fill the drive with test data, and after that reads the data back and compares it.
I wouldn't call that a stress test but a function test.

A stress test would be dozens of concurrent random reads and writes at the same time.
I don't necessarily expect CP to handle that well.

But: On a Windows system with a background virus scanner installed, it may well happen that the virus scanner reads some files, while CP executes code, and the user reads/writes the same or other files.

@dhalbert
Copy link
Collaborator

@uhrheber comments copied from #1643:

That didn't do it.
I merged your pull request [ #1649 ], compiled and flashed it to a pca10059, then I copied my BLE colourpicker code + libs to the drive.
The code started to run, but stopped by itself after about 20 seconds, without me doing anything.
After unplugging and replugging, the drive was wiped.


I tested the pca10059 with a simple main.py, that doesn't use any external libs.
It runs stable. I can even torture the drive with f3write. No errors. The problems seem to be linked to bluetooth here.

With the pca10056, it's completely different.
It runs the bluetooth code without an error.
When I write test data to the drive with f3, while the code is running, the test data shows errors afterwards.
When I write test data with stopped code, I get no errors.

It seems that there are still some timing problems.


@ jerryneedell Would you mind trying this code?
It needs the libraries adafruit_ble and adafruit_bluefruit_connect. It's written for the pca10059, so you'll have to adapt the LED port pins.

from adafruit_ble.uart import UARTServer
import board, digitalio, pulseio
from adafruit_bluefruit_connect.packet import Packet
from adafruit_bluefruit_connect.color_packet import ColorPacket

ledr = pulseio.PWMOut(board.LED2_R, frequency=5000, duty_cycle=65535)
ledg = pulseio.PWMOut(board.LED2_G, frequency=5000, duty_cycle=65535)
ledb = pulseio.PWMOut(board.LED2_B, frequency=5000, duty_cycle=65535)

ledy = digitalio.DigitalInOut(board.LED1)
ledy.direction = digitalio.Direction.OUTPUT
ledy.value = True

uart_server = UARTServer()

def rgbled(colors):
    ledr.duty_cycle = (255 - colors[0]) << 8
    ledg.duty_cycle = (255 - colors[1]) << 8
    ledb.duty_cycle = (255 - colors[2]) << 8

while True:
    # Advertise when not connected.
    uart_server.start_advertising()
    ledy.value = False
    while not uart_server.connected:
        pass

    while uart_server.connected:
        ledy.value = True
        packet = Packet.from_stream(uart_server)
        if isinstance(packet, ColorPacket):
            print(packet.color) 
            rgbled(packet.color) 

This code wipes itself on the pca10059, but runs on the pca10056, so I guess it'll also run on the feather_nrf52840.

@dhalbert dhalbert self-assigned this Mar 18, 2019
@dhalbert
Copy link
Collaborator

i am working on a fix for this.

@dhalbert
Copy link
Collaborator

@jerryneedell @uhrheber I have a test uf2 for you to try. This fixes a whole bunch of problems with using the internal flash as CIRCUITPY: there was more than one underlying issue.

My test was to run the ble_uart_echo_test.py example in Adafruit_CircuitPython_BLE, connect to the peripheral from the Bluefruit app, and then copy files (as large as 3.7kB file) to CIRCUITPY. It used to crash, and now it seems to work. The UART echo works before and after the file copying.

pca10059-sd-write-2019-03-18.uf2.zip

@uhrheber
Copy link

@dhalbert Unfortunately, I forgot the boards with UF2 bootloaders at home, and don't have a JLink at work. I only have boards with the Nordic DFU bootloader at hand. Would you mind sending me the .hex file?
If you're reading this after waking up, then I'm most likely already at home, and have access to boards with UF2.

@jerryneedell
Copy link
Collaborator Author

I tried this on a pca10059.
With Beta5 I did not consistently have the issue with the FS getting wiped, but it was having problems.
ran ble_uart_echo_test - connected from desktop
copied ~5K byte file to CIRCUITPY
system kept running but when I did a soft reboot and tried to restart ble_uart_echo_test
it could not properly read the script from the FS. The ble_uart test script file was damaged, but the FS was still functional. Note: this is not the file I copied over.

loaded new .uf2 and repeated and it seems to work normally
unable to replicate the above issue

@jerryneedell
Copy link
Collaborator Author

jerryneedell commented Mar 18, 2019

ah good -- repeated the test with Beta 5 and after file copy and soft reboot - the FS was corrupted!


Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 4.0.0-beta.5 on 2019-03-17; PCA10059 nRF52840 Dongle with nRF52840
>>>
>>> import ble_uart
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: no module named 'ble_uart'
>>> import os
>>> os.listdir()
['\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00
many more \x00

Does not happen with new build

Note: One one occasion, after reloading the new .uf2 and then erasing and reloading the fielsystem I noticed that I successfully transferred adafruit_ble/ and adafruit_bluefruit_connect/ to CIRCUITPY. Then I copied ble_uart_echo_test.py but when I when to run it, the file was not found and it did not appear in a directory listing. I recopied it and it worked normally.
This was obviously with nothing running on the pca10059.
I have experienced this before so it may be yet another FS issue that is lurking.
SO far, with the new .uf2, I have not corrupted the FS

@dhalbert
Copy link
Collaborator

dhalbert commented Mar 18, 2019

@uhrheber eber here is the hex:
pca10059-sd-write-2019-03-18.hex.zip

I am on Eastern US time (Boston and NY).

@dhalbert
Copy link
Collaborator

@jerryneedell I thought of another possible issue with flushing the filesystem properly which might explain the missing file, and I'll check on that this morning.

@uhrheber
Copy link

uhrheber commented Mar 18, 2019

@dhalbert You're the best. Did you even sleep last night?

The new firmware is much more stable, but still has some quirks.
While my bluetooth colourpicker code was running (without bluetooth connection), I copied various libs to the libs folder on the virtual drive.
When there was less than 50kB free, the drive vanished, and CP entered safe mode.
When I then unplugged and replugged the stick, the program came up again, and the drive letter appeared, but I couldn't delete any of the libs I had copied to the drive. I had to reformat the drive.

This happened once. When I tried the same again, CP entered safe mode as well, but after replugging, I could delete files and copy new ones. But again, after free space was below 50kB, CP entered safe mode.

During all those tests, the original main.py and corresponding libs remained unaltered, and still worked.
(Unless when I erased the drive, of course).

@dhalbert
Copy link
Collaborator

dhalbert commented Mar 18, 2019

@uhrheber eber I do sleep a regular amount. 😴 😃 I work from home so it's easy to work late and check on things in the morning.

The "running out of space" bug is probably separate. It may be a flaw in the FAT filesystem code.

@dhalbert
Copy link
Collaborator

Based on this testing, I'll merge these changes for now, and make sure we have an issue for "filling up the filesystem". Thank you both.

@dhalbert dhalbert added this to the 4.0.0 - Bluetooth milestone Mar 18, 2019
@dhalbert
Copy link
Collaborator

Fixed at least partially by #1661.

@uhrheber
Copy link

uhrheber commented Mar 18, 2019

I compiled the latest master for pca10059.
Ran some bluetooth test code, and used f3 to test the drive.
Result:

                  SECTORS      ok/corrupted/changed/overwritten
Validating file 1.h2w ...     405/        1/      2/      0

  Data OK: 202.50 KB (405 sectors)
Data LOST: 1.50 KB (3 sectors)
	       Corrupted: 512.00 Byte (1 sectors)
	Slightly changed: 1.00 KB (2 sectors)
	     Overwritten: 0.00 Byte (0 sectors)
Average reading speed: 330.73 KB/s

The code ran without problems, though.

@uhrheber
Copy link

I can also see, that CP4 is much slower when writing the flash than CP3.
On an ItsyBitsy M4 Express (yeah, I now have one, just arrived) I get (using f3write):
CP3: Average writing speed: 104.00 KB/s Average reading speed: 875.80 KB/s
CP4: Average writing speed: 48.73 KB/s Average reading speed: 947.08 KB/s

pca10056:
CP4: Average writing speed: 17.39 KB/s Average reading speed: 486.17 KB/s

pca10059:
CP4: Writing speed not available Average reading speed: 412.01 KB/s

@uhrheber
Copy link

I just had a wiped drive again on a pca10059, running the latest master firmware.
I had a program and libs on the drive, copied over the main.py with a different one, and the program didn't start. After unplugging and replugging, the drive was wiped clean.

@dhalbert
Copy link
Collaborator

Which operating system are you using, and how long did you wait after it didn't restart? It sounds like you caught it in the middle before the write was complete.

I do have further ideas about making the caching safer and will get you some test UF2's later.

Could you also show how you set up and ran the f3 test, just for reference? Thanks.

@uhrheber
Copy link

uhrheber commented Mar 19, 2019

In this case, the operating system was Windows 10 Enterprise 1709, because this is what I have at work.
At home, I mostly use Linux Mint 19.1.

After copying over the main.py, I waited quite a while, because when it didn't run, I first started mu to check what was going on, but couldn't get a serial connection, so I decided to replug it.
After that, the drive was wiped clean, but I could get a serial connection.

About f3: It's a Linux tool, that most distributions should have in their repositories, on Debian based distros, it can be installed with sudo apt install f3.

To write test data, you use f3write path, to read and test the data you use f3read path.
F3 is normally used to test USB sticks and SSD's for damaged areas and fake capacity.

There's a similar windows tool called h2testw, but it refuses to write to the pca10059's drive, because it thinks there's not enough space. It can be used for boards with external flash, though.

@uhrheber
Copy link

I freshly flashed a pca10059, put some libs and example code on it, and let it run for some time, replacing the main.py occasionally.
So far I couldn't reproduce the error.

@dhalbert
Copy link
Collaborator

Your experience two comments up is familiar. Windows in particular, and Linux to a lesser extent, do not write all the data and metadata to the drive immediately when you write a file. For Windows, this can take up to 90 seconds, for Linux, it's 10-20 seconds or so. Windows has a long delay only with FAT12 filesystems. See https://superuser.com/questions/1197897/windows-delays-writing-fat-table-on-small-usb-drive-despite-quick-removal/ for more information.

If you copied main.py to CIRCUITPY via drag-and-drop, or using an editor that does not does a filesystem flush, then it's important to "Eject" the drive after the write, or sync on Linux. For more information, see https://learn.adafruit.com/welcome-to-circuitpython/creating-and-editing-code#1-use-an-editor-that-writes-out-the-file-completely-when-you-save-it-7-13, including a few paragraphs above this link, and below. A number of editors (including NOTEPAD and Notepad++) don't flush the file.

@uhrheber
Copy link

I did eject the stick before unplugging it. And the code didn't run when I copied the new main.py over the older, so I guess the error happened at the moment of writing.

On Linux, I disabled all write caching for circuitpython drives.

@dhalbert
Copy link
Collaborator

dhalbert commented Mar 20, 2019

@uhrheber I installed f3, and it talks about writing gigabyte-sized files. There don't seem to be parameters to write smaller files. Do you just run it and have it fill up the filesystem (and presumably fails in some way when it reaches the limit)?

@uhrheber
Copy link

@dhalbert Just start f3, it'll write until the drive is full, but it seems to have a fixed block size, because it usually leaves some few kB free.

@tannewt
Copy link
Member

tannewt commented Mar 27, 2019

Is this still an issue?

@dhalbert
Copy link
Collaborator

We need to test this again. I don't think we've fixed anything that would necessarily fix this.

@tannewt
Copy link
Member

tannewt commented Apr 17, 2019

I'm not going to block 4.0 on this because most boards use external flash. Moving it to 4.x

@tannewt tannewt modified the milestones: 4.0.0 - Bluetooth, 4.x Apr 17, 2019
@dhalbert
Copy link
Collaborator

I'm wondering if #1870 might have fixed this.

@rdagger
Copy link

rdagger commented Sep 11, 2019

I’m having similar issues with the PCA10059 connected to a Raspberry Pi 3. I copied adafruit_ble and adafruit_bluefruit_connect libs to the PCA10059. They use 75.1K of storage. I then loaded a simple BlueTooth test:

from adafruit_ble.uart_server import UARTServer
from adafruit_bluefruit_connect.packet import Packet
from adafruit_bluefruit_connect.button_packet import ButtonPacket

uart_server = UARTServer()

while True:
    # Advertise when not connected.
    uart_server.start_advertising()

    print('waiting for connection...')
    while not uart_server.connected:
        pass

    print('connection established.')
    while uart_server.connected:
        packet = Packet.from_stream(uart_server)
        if isinstance(packet, ButtonPacket):
            print(packet.button, packet.pressed)

It worked once, but on the 2nd try I lost connection and the PCA10059 file system was wiped. I tried several times and the file system kept getting wiped.
I swapped out a brand new sealed PCA10059 and loaded Adafruit CircuitPython 5.0.0-alpha.2-55-g89fed709a on 2019-09-10. I ran the same tests and it worked. Unfortunately, on the 3rd try I lost connection. I rebooted and the libs folder was converted to a file of -1 bytes.

@dhalbert
Copy link
Collaborator

I'm working on specifying the flash layout on nRF boards in one central place instead of spread around, and I'm seeing that the size of the internal filesystem may not be specified exactly correctly. WIll follow up again when I know for sure one way or the other.

@dhalbert
Copy link
Collaborator

dhalbert commented Feb 3, 2020

We've fixed a number of bugs related to internal filesystem stuff on nRF52840, so closing this. Let's open a new one if the problem reappears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants