Skip to content

Conversation

@davimacedo
Copy link
Member

Endpoints for:
a) Import Data into a class - it receives a file in the same format of parse.com and backwards the data that is exported;
b) Import Relation Data into a class relation - it receives a file in the same format of parse.com and backwards the data that is exported;
c) Export Data from a class - It returns a zipped .json file just like parse.com

Copy link
Contributor

@flovilmart flovilmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm not sure that importation should live as part of the web server. For me that seems to be a command line tool that you run on a powerful enough machine.

The export and import methods won't handle anything any large collections, as we already see problems with push notifications sent to 100k users. Given that the data is stringified, buffered and zipped, you need at least 4x the memory required to to make that work, just for a single collection.

package.json Outdated
"dependencies": {
"adm-zip": "0.4.7",
"bcryptjs": "2.3.0",
"bluebird": "3.4.6",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we want bluebird as a dependency

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've removed it, but I think we should think about including it in the project. Take a look in the code the additional lines we had to include in order to manage the promises concurrency (probably not as effective as Bluebird does).

maxUploadSize: maxUploadSize
}));

api.use(ImportRouter.getRouter());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those should be protected by masterKey

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

export class ExportRouter extends PromiseRouter {

handleExportProgress(req) {
return Promise.resolve({ response: 100 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not implemented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We let to implement it later, but it is implemented now.



rest.find(req.config, req.auth, data.name, data.where)
.then((results) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loading a full collection in memory is likely to crash the server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We streamed everything and it is not crashing anymore. We tested with a 10M entries class.

res.status(200);
res.json({response: 'We are importing your data. You will be notified by e-mail once it is completed.'});
}
bluebird.map(restObjects, importRestObject, {concurrency: 100})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code should not be duplicated, only the outcome of the import changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.

res.status(200);
res.json({response: 'We are importing your data. You will be notified by e-mail once it is completed.'});
}
bluebird.map(restObjects, importRestObject, { concurrency: 100 })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same file now and not duplicated anymore.

delete restObject.createdAt;
}
if (restObject.updatedAt) {
delete restObject.updatedAt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why deleting those keys? In case of import we don't want to update them right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same behavior parse.com had and in my opinion that's the correct one. If you are importing new rows today, I expect those rows with createdAt and updatedAt set to today. Parse.com (and also Parse Server) always ignore insert/update of those columns. We've only tried to reproduce the same behavior here.

}
if (restObject.objectId) {
return rest
.update(req.config, req.auth, req.params.className, restObject.objectId, restObject)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a internal upset that just does that, probably use it instead of handling 2 paths.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @flovilmart I've checked the rest and RestWrite classes and they don't implement the upsert, we only have access to it directly from the DatabaseController, but on the import we need things that are only implemented on the RestWrite class such as the call to the expandFilesForExistingObjects() method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the rest will also make all triggers, hooks, etc, be reached. That's the idea.

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@flovilmart
Copy link
Contributor

@davimacedo you probably need a rebase as well as running npm run lint before pushing :)

@davimacedo
Copy link
Member Author

Sure thing... we will do it as soon as we have fixed all reviews you asked.

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@flovilmart
Copy link
Contributor

We've enforced harder rules for the coding style, please rebase on master.

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

@davimacedo
Copy link
Member Author

Hi, @flovilmart
I understand your concern about memory, but people want it easily and friendly in the dashboard. If we run it outside parse server we will also loose the capability of reaching hooks, triggers, etc. For this kind of feature, maybe we could recommend users to run the dashboard connecting to a parse server running on a powered enough machine (maybe in the user localhost). But this is only a comment. Like I commented before, we think memory will not crash here.
Can you please check again?

@davimacedo
Copy link
Member Author

Everything was also rebased and the lint is passing :)

@acegreen
Copy link

acegreen commented Jan 8, 2017

This should be prioritized with the recent hacks on MongoDB databases

@flovilmart
Copy link
Contributor

@acegreen people should have DB backups, not CSV dumps.

@jordanhaven
Copy link

I would really appreciate this, as well. We have customer service people that need to export records for credit card disputes, and the Parse.com behavior is way better/safer than trying to get them into mongo scripting.

@flovilmart
Copy link
Contributor

Not sure if that export would fit your bill as it the entirety of the database that will be exported

@jordanhaven
Copy link

I haven't actually looked at the contents of the PR, I was assuming it was similar to the export feature in Parse.com, that would let you filter a class and then export those results. Maybe that's in the related PR in parse-dashboard (I thought, being by the same author, the two were separate but necessary implementations of the same feature). If that's not the case, my bad!

@davimacedo
Copy link
Member Author

It will not export the entire database, but an entire class or an entire relation. I have it running in my servers and it is actually running pretty good so far.

@facebook-github-bot
Copy link

@davimacedo updated the pull request - view changes

acinader
acinader previously approved these changes Jan 12, 2017
Copy link
Contributor

@acinader acinader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this looks really good to me. sorry a lot of my comments are silly formatting nits, but that's just how i read :).

i have one or two actually substantive questions....

jsonFileStream.write(',\n');
}

jsonFileStream.write(JSON.stringify(data.results, null, 2).substr(1).slice(0,-1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after , slice(0, -1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


jsonFileStream.write('{\n"results" : [\n');

const findPromise = data.name.indexOf('_Join') === 0 ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more super nit, but could format this ternary more consistently/nicer??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better now?

: rest.find(req.config, req.auth, data.name, data.where, { count: true, limit: 0 });

return findPromise
.then((result) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.then((fileData) => {

return emailControllerAdapter.sendMail({
text: `We have successfully exported your data from the class ${req.body.name}.\n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who's we? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved! :)

subject: 'Export failed'
});
})
.then(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be an always? will this get called if an error condition happens and there's a catch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it will always execute

subject: 'Import failed'
});
} else {
throw new Error('Internal server error: ' + error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you use string templates everywhere else...better than concatenation... :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


expressRouter() {
const router = express.Router();
const upload = multer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

res.header('Access-Control-Allow-Origin', '*');
res.header('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE,OPTIONS');
res.header('Access-Control-Allow-Headers', 'X-Parse-Master-Key, X-Parse-REST-API-Key, X-Parse-Javascript-Key, X-Parse-Application-Id, X-Parse-Client-Version, X-Parse-Session-Token, X-Requested-With, X-Parse-Revocable-Session, Content-Type');
res.header('Access-Control-Allow-Headers', 'X-Parse-Master-Key, X-Parse-REST-API-Key, X-Parse-Javascript-Key, X-Parse-Application-Id, X-Parse-Client-Version, X-Parse-Session-Token, X-Requested-With, X-Parse-Revocable-Session, Content-Type, X-CSRF-Token');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wish i knew enough to know if this is as conservative as possible??...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will bring additional vulnerability. Maybe some has a different opinion?

@acinader acinader dismissed their stale review January 12, 2017 02:56

cause i hadn't fully read all the comments

@acinader
Copy link
Contributor

the busted test may/probably is because i change archiver from "archiver": "^1.2.0" to "archiver": "1.3.0" so it'll play well with greenkeeper.io

@jordanhaven
Copy link

@davimacedo could you clarify if this exports an entire class, or a class WITH any filters you've applied in the dashboard (ie: the Parse.com behavior)?

@facebook-github-bot
Copy link

@davimacedo I tried to find reviewers for this pull request and wanted to ping them to take another look. However, based on the blame information for the files in this pull request I couldn't find any reviewers. This sometimes happens when the files in the pull request are new or don't exist on master anymore. Is this pull request still relevant? If yes could you please rebase? In case you know who has context on this code feel free to mention them in a comment (one person is fine). Thanks for reading and hope you will continue contributing to the project.

@jordanhaven
Copy link

^ Meant to write back, but to answer my previous question, this (and the corresponding parse-dashboard) PRs only export a full class. I made some updates to support passing through the filters to the exports (parse-dashboard: LearnWithHomer/parse-dashboard@8d68daa, parse-server: LearnWithHomer@6c3ccb1), and it's working good for us at replicating the Parse.com behavior.

@AmrKafina
Copy link

Any reason this is being held up? I recently put together a tool that automates importing data, but having the ability to manually set objectIds (as I believe this PR adds) would make it a lot better.

@codecov
Copy link

codecov bot commented May 24, 2017

Codecov Report

Merging #3046 into master will increase coverage by <.01%.
The diff coverage is 86.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3046      +/-   ##
==========================================
+ Coverage   90.14%   90.15%   +<.01%     
==========================================
  Files         114      116       +2     
  Lines        7550     7726     +176     
==========================================
+ Hits         6806     6965     +159     
- Misses        744      761      +17
Impacted Files Coverage Δ
src/Routers/FeaturesRouter.js 100% <ø> (ø) ⬆️
src/ParseServer.js 88.75% <100%> (+0.21%) ⬆️
src/RestWrite.js 93.28% <100%> (+0.4%) ⬆️
src/middlewares.js 97.26% <100%> (ø) ⬆️
src/Controllers/SchemaController.js 97.04% <100%> (ø) ⬆️
src/Config.js 94.92% <100%> (+0.03%) ⬆️
src/rest.js 98.5% <100%> (+1.49%) ⬆️
src/Routers/ImportRouter.js 78.82% <78.82%> (ø)
src/Routers/ExportRouter.js 91.86% <91.86%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f11ad5...330a195. Read the comment docs.

@davimacedo
Copy link
Member Author

Guys... Sorry about the delay with this PR.

@jordanhaven Thanks for the contribution. I've just incorporated it in the PR.

@flovilmart and @acinader I've just done the requested changes and rebased to the master. Can you guys check again please?

@jordanhaven
Copy link

jordanhaven commented May 24, 2017

@davimacedo thanks for incorporating! Can you do the same for the parse-dashboard changes?

ETA: By that I mean passing the filters through as in this commit:LearnWithHomer/parse-dashboard@8d68daa

Thanks!

@davimacedo
Copy link
Member Author

@jordanhaven Just done. Thanks!

@nitrag
Copy link

nitrag commented May 25, 2017

+1 thanks for this!

@tutchmedia
Copy link

Hey Folks, Does anyone know what is happening with this please?

@jpgupta
Copy link

jpgupta commented Feb 14, 2018

bump :) would be a really cool feature

@mnlbox
Copy link

mnlbox commented Jul 25, 2018

@davimacedo Any news about this?
Any plan to merge this PR?

@flovilmart
Copy link
Contributor

It’s unlikely at this point as the main memory usage concerns are not addressed. This belongs to a standalone tool.

I’m not against exposing a simpler API to ParseServer so one can import objects based on a stream representation (with the objectId preserved) and export them with a streaming API.
As for exposing such endpoints, now that everyone has access to the mongodb / Postgres, I don’t see how CSV is a valid backup / export solution. As you can already backup/export right from your DB.

@flovilmart flovilmart closed this Jul 25, 2018
@mnlbox
Copy link

mnlbox commented Jul 30, 2018

@flovilmart CSV is a general purpose format and we can import it in many applications like BI tools, Data analysing, ...
I think this PR implemented a very useful feature and suggest you to reopen it.

Backup/Export from parse rather than mongo dump is a very brilliant feature and can useful for many people like me specially if we can call it from parse-dashboard (related parse-dashboard PR that blocked and waiting this PR: parse-community/parse-dashboard#585).

/CC: @davimacedo

@georgesjamous
Copy link
Contributor

I agree with @flovilmart
I think that Importing and Exporting the data should be performed on the RAW data store in the database. Since no additional restrictions or checking needs to be done server side, it makes sense not to include parse in this process. You can take snapshots of your database as you go on as backup for a production case.

If the format file is what you worry about, to my knowledge you can export Mongo into CSV and vice-versa. (haven't tried it)
mongoexport is a utility that produces a JSON or CSV export of data stored in a MongoDB instance
mongoexport docs
relevant post on stackoverflow

@flovilmart
Copy link
Contributor

@mnlbox it is likely that this feature, as it is implemented now, blocks completely the server for an undetermined period of time, at import or export. It is also possible that this feature doesn’t work if the process takes more than 30s (the usual request timeout).

For those reasons and the one explained above, we’ll keep this one closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.