-
-
Notifications
You must be signed in to change notification settings - Fork 18
Add S3OptimizedUploadStorage #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #128 +/- ##
==========================================
+ Coverage 99.38% 99.43% +0.04%
==========================================
Files 7 8 +1
Lines 162 176 +14
==========================================
+ Hits 161 175 +14
Misses 1 1
Continue to review full report at Codecov.
|
Ok, fixed the issues with black and pydocstyle. The fails from the CI/pytest and bandit seem not to be cause by any of my changes. But I have a hard time adding a test/coverage for this as it don't know how to emulate the initial upload and then the save/copy call on MockS3. Will try again looking into it but would be really glad to get some pointers. |
@codingjoe ok, I got now some tests to cover all my added code. For some reason the tests don't run through though and the coverage doesn't get updated here on Github. I'll leave it for now until your inputs. |
I'll look into it next week, since I'm on vacation right now. But thanks in advance for all the hard work. |
No worries, take your time and enjoy your vacation! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @drakon,
Thanks a lot. This looks like a good start. I left you a couple of comments, but this will require a couple of rounds to really nail the implementation. Don't be discouraged, though.
It would be great if you could rebase your branch, since I fixed the CI suite.
Hi Joe, yeah, sure. Thanks for the comments! Will go through your comments and address them in the PR. Needed some time to setup everything in the first place so I haven't spent enough focus time on the actual changes. Will do! |
Hi @codingjoe, took me a while as I was engaged with other parts/projects. But I finally improved the PR and commented on the changes I made above. Hope it's good now. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, LGTM, lets give this a go!
It's released in 5.4.0 🎉 |
Awesome, thanks! :) |
@drakon, we tried switching to the new backend, but saw some errors. I believe you are retuning the wrong value. You should be returning |
That strikes me as odd as this is a part that I didn't change and is consistent with the behaviour of the default implementation (current version):
Secondly the So, the only real different should be contained to this line:
Can you provide more about the error you receive or the keys/name/paths used? |
@amureki will give you an update about that. I also thought this might have been a race condition, but S3 should be consistent. Maybe the errors we say were also related to another issues. @amureki has more details when he is back from vacation next week. |
@drakon hey Christian! First, thank you for this PR, this is a nice improvement. 👍 I investigated a bit more here and found out that the errors ( Now, I wonder if we can tweak this custom backend to handle such cases and check if |
Hi @amureki no worries. :) Hmm. I see. Well the initial idea was that this storage only optimises uploads where you already know you have an S3 file. That's also why there's an error when that's not the case (assumption is that From what I understand from you is that you would want to do an upload from any other sources as well with this optimised storage? It should certainly be doable but I was a bit hesitant (also with the comment from Joe) to silently fall back to an un-optimised case when the condition is not met for 2 reasons:
Having said that, it should be straight-forward to implement it as instead of throwing an error we could just use line 32 & 33. Unfortunately I don't even have a use-case that where I can test this in a real-world scenario. And I'd probably would recommend to change the check to something that is necessary but also sufficient to check if content is really an S3 file. Meaning it identifies all S3 files as such but would exclude all others. But in that case one could also directly add this to django-storages. But to be clear: In that case it's just about convenience, there would be no optimization compared to the normal backend (it literally would be the same code). |
@drakon thank you for digging in with us into this issue. :) I agree, that it might harm to do more changes to this storage backend. And instead, we could just update certain fields with the custom backend variable (or introduce a custom field with the storage backend plugged in). So, I think we are good here, thank you again for the help and for the initial contribution! |
My pleasure. Thanks for providing this library in the first place. :) Yes, I think that's probably the easiest way. If there is growing need and more people wanting this, one could think about trying to get this into django-storages directly as an optimization for such cases as in that case it'd be an efficiency gain for others too. But at this point I'm not even sure if it's a real issue outside of this library. |
Hi Joe
As mentioned I now added the code into your library. I tested it locally with my setup and for my project and it worked as before for me.
The only thing I'm not that happy is that I had to put it into another file "storages_optimized.py" as when I tried it in "storages.py" I got import issues because of the import of "storages.backends.s3boto3". I found some workarounds but they seemed rather risky compared to just using another file name. But if you have a better idea, please do change it.
I also added some comments to the readme.
Not sure if it needs more notes for caution as I don't know all the cases that don't work now (for example it's probably not a good idea to just use this as the DEFAULT_FILE_STORAGE in Django as file uploads/save from other places than S3 probably don't work as expected).
Looking forward to your inputs. And thanks for your patience. ;-)
Best,
Christian