-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Endpoints for importing and exporting data #3046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
flovilmart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I'm not sure that importation should live as part of the web server. For me that seems to be a command line tool that you run on a powerful enough machine.
The export and import methods won't handle anything any large collections, as we already see problems with push notifications sent to 100k users. Given that the data is stringified, buffered and zipped, you need at least 4x the memory required to to make that work, just for a single collection.
package.json
Outdated
| "dependencies": { | ||
| "adm-zip": "0.4.7", | ||
| "bcryptjs": "2.3.0", | ||
| "bluebird": "3.4.6", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we want bluebird as a dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've removed it, but I think we should think about including it in the project. Take a look in the code the additional lines we had to include in order to manage the promises concurrency (probably not as effective as Bluebird does).
src/ParseServer.js
Outdated
| maxUploadSize: maxUploadSize | ||
| })); | ||
|
|
||
| api.use(ImportRouter.getRouter()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those should be protected by masterKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/Routers/ExportRouter.js
Outdated
| export class ExportRouter extends PromiseRouter { | ||
|
|
||
| handleExportProgress(req) { | ||
| return Promise.resolve({ response: 100 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We let to implement it later, but it is implemented now.
src/Routers/ExportRouter.js
Outdated
|
|
||
|
|
||
| rest.find(req.config, req.auth, data.name, data.where) | ||
| .then((results) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loading a full collection in memory is likely to crash the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We streamed everything and it is not crashing anymore. We tested with a 10M entries class.
src/Routers/ImportRelationRouter.js
Outdated
| res.status(200); | ||
| res.json({response: 'We are importing your data. You will be notified by e-mail once it is completed.'}); | ||
| } | ||
| bluebird.map(restObjects, importRestObject, {concurrency: 100}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code should not be duplicated, only the outcome of the import changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored.
src/Routers/ImportRouter.js
Outdated
| res.status(200); | ||
| res.json({response: 'We are importing your data. You will be notified by e-mail once it is completed.'}); | ||
| } | ||
| bluebird.map(restObjects, importRestObject, { concurrency: 100 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same file now and not duplicated anymore.
| delete restObject.createdAt; | ||
| } | ||
| if (restObject.updatedAt) { | ||
| delete restObject.updatedAt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why deleting those keys? In case of import we don't want to update them right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same behavior parse.com had and in my opinion that's the correct one. If you are importing new rows today, I expect those rows with createdAt and updatedAt set to today. Parse.com (and also Parse Server) always ignore insert/update of those columns. We've only tried to reproduce the same behavior here.
src/Routers/ImportRouter.js
Outdated
| } | ||
| if (restObject.objectId) { | ||
| return rest | ||
| .update(req.config, req.auth, req.params.className, restObject.objectId, restObject) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a internal upset that just does that, probably use it instead of handling 2 paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @flovilmart I've checked the rest and RestWrite classes and they don't implement the upsert, we only have access to it directly from the DatabaseController, but on the import we need things that are only implemented on the RestWrite class such as the call to the expandFilesForExistingObjects() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the rest will also make all triggers, hooks, etc, be reached. That's the idea.
|
@davimacedo updated the pull request - view changes |
|
@davimacedo updated the pull request - view changes |
9dc61c8 to
18680e1
Compare
|
@davimacedo updated the pull request - view changes |
|
@davimacedo updated the pull request - view changes |
|
@davimacedo you probably need a rebase as well as running |
|
Sure thing... we will do it as soon as we have fixed all reviews you asked. |
|
@davimacedo updated the pull request - view changes |
87ec12f to
7f30de6
Compare
|
@davimacedo updated the pull request - view changes |
7f30de6 to
86fffad
Compare
|
@davimacedo updated the pull request - view changes |
|
@davimacedo updated the pull request - view changes |
|
We've enforced harder rules for the coding style, please rebase on master. |
|
@davimacedo updated the pull request - view changes |
e010a13 to
ae6f46f
Compare
|
@davimacedo updated the pull request - view changes |
ae6f46f to
9350a18
Compare
|
@davimacedo updated the pull request - view changes |
9350a18 to
efd6930
Compare
|
@davimacedo updated the pull request - view changes |
efd6930 to
c6b784c
Compare
|
@davimacedo updated the pull request - view changes |
|
Hi, @flovilmart |
|
Everything was also rebased and the lint is passing :) |
|
This should be prioritized with the recent hacks on MongoDB databases |
|
@acegreen people should have DB backups, not CSV dumps. |
|
I would really appreciate this, as well. We have customer service people that need to export records for credit card disputes, and the Parse.com behavior is way better/safer than trying to get them into mongo scripting. |
|
Not sure if that export would fit your bill as it the entirety of the database that will be exported |
|
I haven't actually looked at the contents of the PR, I was assuming it was similar to the export feature in Parse.com, that would let you filter a class and then export those results. Maybe that's in the related PR in parse-dashboard (I thought, being by the same author, the two were separate but necessary implementations of the same feature). If that's not the case, my bad! |
|
It will not export the entire database, but an entire class or an entire relation. I have it running in my servers and it is actually running pretty good so far. |
|
@davimacedo updated the pull request - view changes |
acinader
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this looks really good to me. sorry a lot of my comments are silly formatting nits, but that's just how i read :).
i have one or two actually substantive questions....
src/Routers/ExportRouter.js
Outdated
| jsonFileStream.write(',\n'); | ||
| } | ||
|
|
||
| jsonFileStream.write(JSON.stringify(data.results, null, 2).substr(1).slice(0,-1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after , slice(0, -1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| jsonFileStream.write('{\n"results" : [\n'); | ||
|
|
||
| const findPromise = data.name.indexOf('_Join') === 0 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more super nit, but could format this ternary more consistently/nicer??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better now?
src/Routers/ExportRouter.js
Outdated
| : rest.find(req.config, req.auth, data.name, data.where, { count: true, limit: 0 }); | ||
|
|
||
| return findPromise | ||
| .then((result) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/Routers/ExportRouter.js
Outdated
| .then((fileData) => { | ||
|
|
||
| return emailControllerAdapter.sendMail({ | ||
| text: `We have successfully exported your data from the class ${req.body.name}.\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who's we? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved! :)
| subject: 'Export failed' | ||
| }); | ||
| }) | ||
| .then(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be an always? will this get called if an error condition happens and there's a catch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it will always execute
src/Routers/ImportRouter.js
Outdated
| subject: 'Import failed' | ||
| }); | ||
| } else { | ||
| throw new Error('Internal server error: ' + error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you use string templates everywhere else...better than concatenation... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| expressRouter() { | ||
| const router = express.Router(); | ||
| const upload = multer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| res.header('Access-Control-Allow-Origin', '*'); | ||
| res.header('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE,OPTIONS'); | ||
| res.header('Access-Control-Allow-Headers', 'X-Parse-Master-Key, X-Parse-REST-API-Key, X-Parse-Javascript-Key, X-Parse-Application-Id, X-Parse-Client-Version, X-Parse-Session-Token, X-Requested-With, X-Parse-Revocable-Session, Content-Type'); | ||
| res.header('Access-Control-Allow-Headers', 'X-Parse-Master-Key, X-Parse-REST-API-Key, X-Parse-Javascript-Key, X-Parse-Application-Id, X-Parse-Client-Version, X-Parse-Session-Token, X-Requested-With, X-Parse-Revocable-Session, Content-Type, X-CSRF-Token'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wish i knew enough to know if this is as conservative as possible??...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will bring additional vulnerability. Maybe some has a different opinion?
cause i hadn't fully read all the comments
|
the busted test may/probably is because i change |
|
@davimacedo could you clarify if this exports an entire class, or a class WITH any filters you've applied in the dashboard (ie: the Parse.com behavior)? |
|
@davimacedo I tried to find reviewers for this pull request and wanted to ping them to take another look. However, based on the blame information for the files in this pull request I couldn't find any reviewers. This sometimes happens when the files in the pull request are new or don't exist on master anymore. Is this pull request still relevant? If yes could you please rebase? In case you know who has context on this code feel free to mention them in a comment (one person is fine). Thanks for reading and hope you will continue contributing to the project. |
|
^ Meant to write back, but to answer my previous question, this (and the corresponding parse-dashboard) PRs only export a full class. I made some updates to support passing through the filters to the exports (parse-dashboard: LearnWithHomer/parse-dashboard@8d68daa, parse-server: LearnWithHomer@6c3ccb1), and it's working good for us at replicating the Parse.com behavior. |
|
Any reason this is being held up? I recently put together a tool that automates importing data, but having the ability to manually set objectIds (as I believe this PR adds) would make it a lot better. |
7efba55 to
d7087c1
Compare
Codecov Report
@@ Coverage Diff @@
## master #3046 +/- ##
==========================================
+ Coverage 90.14% 90.15% +<.01%
==========================================
Files 114 116 +2
Lines 7550 7726 +176
==========================================
+ Hits 6806 6965 +159
- Misses 744 761 +17
Continue to review full report at Codecov.
|
d7087c1 to
330a195
Compare
|
Guys... Sorry about the delay with this PR. @jordanhaven Thanks for the contribution. I've just incorporated it in the PR. @flovilmart and @acinader I've just done the requested changes and rebased to the master. Can you guys check again please? |
|
@davimacedo thanks for incorporating! Can you do the same for the parse-dashboard changes? ETA: By that I mean passing the filters through as in this commit:LearnWithHomer/parse-dashboard@8d68daa Thanks! |
|
@jordanhaven Just done. Thanks! |
|
+1 thanks for this! |
|
Hey Folks, Does anyone know what is happening with this please? |
|
bump :) would be a really cool feature |
|
@davimacedo Any news about this? |
|
It’s unlikely at this point as the main memory usage concerns are not addressed. This belongs to a standalone tool. I’m not against exposing a simpler API to ParseServer so one can import objects based on a stream representation (with the objectId preserved) and export them with a streaming API. |
|
@flovilmart CSV is a general purpose format and we can import it in many applications like BI tools, Data analysing, ... Backup/Export from parse rather than mongo dump is a very brilliant feature and can useful for many people like me specially if we can call it from parse-dashboard (related parse-dashboard PR that blocked and waiting this PR: parse-community/parse-dashboard#585). /CC: @davimacedo |
|
I agree with @flovilmart If the format file is what you worry about, to my knowledge you can export Mongo into CSV and vice-versa. (haven't tried it) |
|
@mnlbox it is likely that this feature, as it is implemented now, blocks completely the server for an undetermined period of time, at import or export. It is also possible that this feature doesn’t work if the process takes more than 30s (the usual request timeout). For those reasons and the one explained above, we’ll keep this one closed. |
Endpoints for:
a) Import Data into a class - it receives a file in the same format of parse.com and backwards the data that is exported;
b) Import Relation Data into a class relation - it receives a file in the same format of parse.com and backwards the data that is exported;
c) Export Data from a class - It returns a zipped .json file just like parse.com