-
Notifications
You must be signed in to change notification settings - Fork 53
Logging issue : a lot of files and logs drops #469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Idem on RTK Facet with firmware 3.2 |
@Stefal - We found a fixed a particularly bad bug on May 18th that was causing issues with corrupt log file names and unexpected resets. This bug was introduce on Feb 28 so v3.1 should have been good, while v3.2 and v3.3 have this bug. Can you share some of the sections of the log where the RTK Fix is lost? Can you tell if the break is due to a system reset or does the system simply lose RTK Fix and then re-gain it? |
I will wait for a new release with the various fixes you have mentionned, and redo some tests. I think that parts of my problems are related to the 5hz gnss frequency that the bluetooth stream can't handle. I forgot that log to file and bluetooth settings are the same, contrary to my old logger in which the stream are independant (1 or 2hz for bluetooth, and 5hz or more with many ubx messages for log to SD card). |
Test with the 3.4RC (RTK Firmware: v3.4-May 25 2023) :
Result:
I will share the logs with you in private (email) |
Serial bytes are all being read correctly even when the file size is not increasing. This would seem to indicate the validation parser is not passing data on to the SD logging routine. I would expect the parser to throw a few sentences out across multiple hours. The fact it's throwing out multiple seconds worth of data is wrong. |
tldr; The release candidate should have some major improvements. Please load v3.5RC when possible and report back. As usual, thanks for reporting this and sorry for the data corruptions! To replicate this issue:
Above: This is how the system works. The ESP32 takes in NMEA from the ZED transmits that NMEA to the phone over BT. Everything works fine, there are no NMEA or UBX sentence corruptions (ie, no data is dropped). Above: If we start transmitting RTCM/NTRIP corrections from the phone, back to the ESP32, and back to the ZED, we begin to see bytes dropped (from the ZED-ESP NMEA communication). Above: Interestingly, if we suppress the transmission of serial from the ESP32 to the ZED (commenting out lines in the Task.ino btReadTask code), there is no data corruption. It's therefore presumed that the BT stack is not interfering with the ESP32 UART2 (it's not an interrupt issue). Above: Additionally, I verified the ZED was correctly outputting serial at all times. It is. The ESP32 is dropping characters out of the UART. Things we’ve tried:
Nothing above decreased the number of corrupted sentences coming from the ZED when the ESP32 is writing data. What is especially telling is the reduction of the UART speed. If we are slowing down the UART speed, all interrupts and tasks have a lot more time to operate before data is lost, yet we still lose data. I believe there is a very low level issue with the esp32-idf that causes the UART FIFO to misbehave when we are simultaneously reading and writing to it. I am not able to figure it out at this time. However, there is a simple fix: Above: We can push the RTCM over I2C rather than UART, thus avoiding a serial2.write() while we are doing a serial2.read(). Originally I was seeing corruptions every ~10s or so. With the fix in place, I have not seen a corrupted message at 10Hz, with 10 messages turned on, in 15 minutes. There is still work to do:
|
Thanks Nathan, This RC is uploaded on my express. I will do a quick test tomorrow evening. |
Could you swap RTCM and UBX/NMEA ?
|
Following your investigations, I made new tests. RTK Express Dynamic : automotive Rover static on balcony -No connection : Integrity OK. For the last two files, I also checked the age of differentials: It is still unstable, like in: #359 |
I was able to replicate Stefal's report: RTK lock would be lost after a brief amount of time (less than 10 minutes). @PaulZC pointed out the flaw in my solution. We can't push single bytes to the ZED over I2C. Additionally, we were not servicing the Bluetooth buffer before characters were dropped leading to 'RX Full! Discarding X bytes' errors (BT RX buffer is 512 bytes, I2C comm was blocking). A buffer has been implemented. I've had a solid RTK lock for the last hour. Sorry for the hassle! Please give v3.5 RC June 19th a try. |
How did you feed the RTCM back to the ZED without a Bluetooth connection? |
I don't understand. I always had a bluetooth connection. Edit : My logger is a small fork of the one from @PaulZC. I have added a bluetooth module connected on the ZED UART2 |
Ah, ok, thanks! I was just looking at the photos and was confused. Can you walk me through the steps you use to process the ubx files through rtklib? |
Rtklib (only rtkplot) is only used to display the trace and get % stats for single/float/fix:
I can open the same pos file in josm, the desktop app to edit OpenStreetMap. I like this way because I use it everyday for Osm contribution, and for all the aerial image sources available : |
Excellent! Thank you. I captured a route in my car but did not have the extra msgs turned on. I'll capture another tomorrow. |
This morning I spent some times with U-Center on other ubx messages, like MON-MSGPP, MON-COMMS, RXM-COR, RXM-RTCM.
I'm too busy to go deeper, but it looks like an interesting point. |
Thank you Stefal! This is very helpful. |
Below is a ~10 minute drive with a baseline of 11.5km (7 miles) using an RTK Express, running v3.5 (release version), and a u-blox magnetic mount antenna. Seems fine to me. Additionally, your comments about RTCM msg/min had me go looking. Above is a 5 minute capture via u-center RXM-RTCM. While the ZED should have received 336 messages, a few of the sentence types (1124 in particular) are getting corrupted via I2C (~10% worst case). I would not expect a loss of 10% of a particular message to cause the ZED to drop to RTK Float. I've seen the ZED maintain RTK fix for nearly 30 seconds without RTCM so a lost message, very periodically should not be causing your issue. The mystery continues! I cannot yet explain why you're dropping so much RTCM via Bluetooth but we'll keep working on it. I think there's a few more spots I can improve. |
The base station you've used seems to not send many msm messages. It could explain why we got so different results. Here are the results with my own base station and its default settings, after a 5 minutes capture. 1129 vs 2710 messages, that's a big difference ! If you want to test, you can use my own base (mount point EMONIE on caster.centipede.fr:2101) |
Ah, well that is a lot! That would explain it. The Bluetooth connection is getting overwhelmed. I think I've got a solution for you. I just pushed v3.6RC. Please install and then use the debug menu to increase the SPP RX buffer to 2048 (shown below): I was able to get ~5 minutes of RTCM from your mount point without loosing anything. As a bit of history (and notes to myself): Prior to Aug 17, 2022 the firmware had a larger SPP RX buffer of 2048 bytes. So since version v2.4, the firmware has been using a smaller 512 byte Bluetooth SPP RX buffer. If you've been using this mount point for the last year, I suspect this Bluetooth overrun issue has been causing problems for you for some time. My apologies. I did not know there were mount points out there with this much data being transmitted. |
Excellent! Thanks for testing. I'll close this once we do a full release. |
Thanks for working through this issue with us. I'm glad we got it fixed! This has been added to the latest v3.6 release. |
Subject of the issue
We are using some RTK Express to log data when we walk or drive.
Instead of one file for each trip, we have a lot of files, drops inside the files, and too much float or single fix.
Your workbench
Steps to reproduce
Expected behavior
Actual behavior
some example:
The text was updated successfully, but these errors were encountered: