-
Notifications
You must be signed in to change notification settings - Fork 166
Description
The GCS HTTP protocol -- but not the Python API -- has the ability to set ifGenerationMatch when creating a storage object:
Makes the operation conditional on whether the object's current generation matches the given value. Setting to 0 makes the operation succeed only if there is a live version of the object.
Why it's useful: With this feature, the client could create a directory placeholder entry (a 0-byte object with a name ending in '/') very efficiently like this:
blob = bucket.blob('path/to/my/subdirectory/')
blob.upload_from_string(b'', if_generation_match=0)
That one round trip creates the directory placeholder entry if it doesn't already exist. The alternatives are to first make a round trip to check if the entry exists or else to let the bucket accumulate identical placeholder entries (esp. for top level directories) by blindly creating them. [Or does GCS check if an uploaded object matches the current generation and optimize that case? -- Nope.]
Why that matters: Directory placeholders speed up gcsfuse by an order of magnitude. Without the placeholders, you have to use gcsfuse in --implicit-dirs mode, and such a mount is frustratingly slow for interactive work. E.g. it takes several seconds just to list a tiny directory containing 2 files. With the placeholders, you can run gcsfuse without --implicit-dirs, and that mount lists directories in a tenth of a second or two.
Proposal: I could create a Pull Request adding this feature if you like, with either the specific if_generation_match query parameter or a way to pass in additional query parameters.
Another alternative is recommend that callers do something like subclass Blob to override _add_query_parameters() to add the if_generation_match=0 name-value pair. That's ugly and fragile.
Is there a way to do this that I'm missing? Are there better alternatives?