Skip to content

Flawed saveEventually architecture (does not execute saves in order) #427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rdbayer opened this issue Oct 16, 2015 · 5 comments
Closed
Labels
type:feature New feature or improvement of existing feature

Comments

@rdbayer
Copy link

rdbayer commented Oct 16, 2015

I know this is more high-level than a bug report typically should be, but I am convinced that there is a fundamental flaw in how the saveEventually() mechanic works which cascades into a bunch of issues.

Root Cause(s)

  • I would expect that saveEventually() calls should synchronously add operations to a queue, and then that queue in the background executes the operations in their original order where one operation must finish before the next. I have not inspected the open source code, but there is ample evidence that this is not in fact how it is implemented.
  • Note: the original order does not need to be respected IF there is no overlapping parent/child relationships in the graphs of the objects being saved, but if this optimization is being attempted to be utilized, there must be a bug in the parent/child traversal checking algorithm for the issues below to occur, plus it seems error-prone because there would be circumstances where appropriate parent/child data has not been included in a query, and therefore it is not possible to know there is a parent/child relationship even though it should be factored in.

Resultant Issues

  1. Deadlocks can happen in what should be benign places. While Richard Ross's commit da2df65 did fix some deadlock issues (thank you!), I have sporadically still seen deadlocks strike. One notable time, the stack trace showed that the two threads deadlocking were callbacks from basically sequential X.saveEventually() and Y.saveEventually() calls. This strikes me as bizarre because as I discussed above, I would expect these calls to do nothing more than synchronously adding objects to a queue, and logically code for Y saving itself should not be executing before X has completely finished saving itself. (Again if there is logic which is only meant to allow this when it's "safe", clearly that logic is broken, because here it wasn't safe).
  2. Data sometimes does not get persisted to the backend. Another issue I've seen is that when calling X.saveEventually() then Y.saveEventually(), even though the changes are reflected in local storage, the changes to Y never get persisted to the database. I've tried putting the app into the background, restarting it multiple times, etc., to increase the chances of the save happening, but it just never does. Note that unlike the deadlocks that are sporadic, for this issue I DO have a project that can reproduce this with 100% consistency that I am happy to share with Parse employees. My guess is that this is due to some broken logic detecting parent/child relationships which prevents the second save from persisting because the object being saved is involved in the first save (i.e. object X has a pointer to object Y).
  3. This is much more minor than issues Update coveralls integration. #1 and Combine podspecs, using platform-specific settings #2 but is just further evidence of the architectural issue. As noted in https://developers.facebook.com/bugs/781962138589388/, if you call X.removeObject(a, "array"), X.saveEventually(), X.addObject(b, "array"), and X.saveEventually(), Parse cannot handle it. This again points at the fact that these operations should be queued but then executed in order, which is not happening. This may seem innocuous because you can wait for the result, but it actually defeats the entire point of the saveEventually() architecture, which is that you can fire-and-forget and have a responsive app instead of always waiting for saves to finish asynchronously before allowing continued user engagement.

Resultant Impact
I don't want to be melodramatic, but these issues have a catastrophic impact on our app. Places in our UI where we want users to be able to select multiple options in rapid succession can cause deadlock and/or missing data (i.e. does not get persisted to backend). Our only workaround right now is to show a Saving indicator as we sit there and wait for a saveInBackground() call to return, and this makes a terrible user experience, and severely discourages them from performing multiple actions in quick succession and has a big impact on our engagement and retention. We're an early startup at small scale now, but we very quickly need to scale up to a large number of users, and with these major issues we simply cannot keep realistically using Parse.

I know this is a thorny and difficult piece of code to diagnose/debug/re-architect, but the current way it's architected just does not work and to a large degree defeats the whole purpose of the saveEventually() architecture, which is to fire and forget and not wait for callbacks. Without this working, one cannot really build a responsive app that is acceptable in the modern app world if you have to constantly wait for background saves to finish before allowing the user to safely progress. Please prioritize this highly and give it the due attention it deserves!

@richardjrossiii
Copy link
Contributor

Related: #359, #32.

Currently, the order of save eventually commands is persisted on a per-object basis (every PFObject has an internal _eventuallyTaskQueue that is respected for operations).

As much as I'd like to expand that to allow for atomic operations based on object graph, I don't think that's a feasible goal. There's no good algorithm we currently have that can detect if two objects are in a relationship with each other (see: collectDirtyChildren: and friends), so the only solution for that would be to put it in its own queue.

If memory serves, point number 3 is actually not possible for a different reason - every operation you perform on an object creates an instance of PFFieldOperation, and attempts to merge it with the previous operations. As we don't have a way to merge an 'add' and 'remove' operation together on the back-end in a single operation (see the REST API for Updating Objects), and thus we can't support that flow in SDK.

FWIW, when the initial bug on Facebook developers came in (https://developers.facebook.com/bugs/667066373397539/), we did have a discussion on whether or not our back-end could be updated to support it, and we decided at that time there weren't enough people with that use-case to go ahead and attempt to support it right now.

There are certainly architectural issues with LDS, and we have been having serious internal discussions on the best way to move forward and attempt to fix them, but seeing as backwards compatibility is a huge thing to have to maintain, some of these implementations may take a while to ever be seen.

We have also discussed eventually depreciating saveEventually mostly for the reasons you have outlined above - it really doesn't work how most people would expect. I wish I had a better solution for you, although one might be to build up your own version of the eventually without respect to operations (which seems to be the primary issue you're running into), which can guarantee proper ordering of operations.

@rdbayer
Copy link
Author

rdbayer commented Oct 16, 2015

Thanks for the explanation Richard. Would it be remotely feasible to consider having a global task queue instead of on a per-object basis? I'm obviously not an expert on this, but my sense is that this would solve a lot of issues. In theory it sacrifices a bit of performance because save requests aren't being parallelized that theoretically could be, but conceptually this feels to me like it greatly simplifies and reduces the risk of deadlocks, parent/child dirtying, etc. (Personally, I would advocate that far ahead of actually taking the object graph into account)

@RickDT
Copy link

RickDT commented Oct 16, 2015

FWIW, we had a test repo for problem 2 and it was fixed in 1.8.4 (#150). This was a very simple test case though, so I'm curious if yours is still exhibiting this problem with the latest SDK @rdbayer?

@richardjrossiii
Copy link
Contributor

@rdbayer A global task queue is certainly one option to consider. I'm not certain that it is the best route to go with right now, but I will bring it up with the rest of the team to get their thoughts.

@stale
Copy link

stale bot commented Sep 19, 2018

This issue has been automatically marked as stale because it has not had recent activity. If you believe it should stay open, please let us know! As always, we encourage contributions, check out the Contributing Guide

@stale stale bot added the wontfix label Sep 19, 2018
@stale stale bot closed this as completed Sep 26, 2018
@mtrezza mtrezza added type:feature New feature or improvement of existing feature and removed type:improvement labels Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or improvement of existing feature
Projects
None yet
Development

No branches or pull requests

4 participants