-
Notifications
You must be signed in to change notification settings - Fork 199
Fragment list consolidation for tiledb cloud arrays #5059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
677f049 to
f776d91
Compare
| throw_if_not_ok(fragment_consolidator->consolidate_fragments( | ||
| array_name, encryption_type, encryption_key, key_length, fragment_uris)); | ||
| // Consolidate | ||
| auto fragment_consolidator = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed dynamic_cast that should be avoided when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not do that now. If we're going to do this, let's change all the places that use Consolidator::create for something better and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why shouldn't we do that immediate improvement? This is the only occurrence of a dynamic_cast in consolidation code, and the reason is that consolidate_fragments is a method of FragmentConsolidator and not of its parent class Consolidator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing offline, I am reverting this change, although I think it's improving the current situation, to get this long standing PR finally unblocked and merged.
| // Check if array exists | ||
| ObjectType obj_type; | ||
| throw_if_not_ok( | ||
| object_type(storage_manager->resources(), array_uri, &obj_type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this in the else case so that it's not executed for remote arrays and cause 2 extra REST requests (is_group and is_array)
f776d91 to
124ce98
Compare
124ce98 to
4e092ea
Compare
6b273c4 to
155a23c
Compare
| array_consolidation_request_builder->initFragments( | ||
| fragment_uris->size()); | ||
| for (size_t i = 0; i < fragment_uris->size(); i++) { | ||
| fragment_list_builder.set(i, (*fragment_uris)[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those need to become relative to cross the wire, otherwise it's a security concern. Use serialize_array_uri_to_relative and deserialize_array_uri_to_absolute utility functions
155a23c to
3ab70bd
Compare
|
@KiterLuc we should discuss before merging this in case we first need to address this one first: https://app.shortcut.com/tiledb-inc/story/52302/increase-number-of-max-fragments-in-tiledb-fragment-list-consolidation |
| fragment_uris.reserve(fragment_reader.size()); | ||
| for (const auto& fragment_uri : fragment_reader) { | ||
| fragment_uris.emplace_back(fragment_uri); | ||
| fragment_uris.emplace_back(deserialize_array_uri_to_absolute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REST CI is failing because the path normalization here is not happening on REST since go-capnp handles deserializing the request from generated source code. So we are serializing absolute URIs as relative URIs from core client-side, but REST does not call deserialize_consolidation_request and the switch back to absolute never happens.
We could add a tiledb_handle_array_consolidation_request much like tiledb_handle_load_array_schema_request that would just call deserialize_consolidation_request, but we'll need a TileDB-Go binding and subsequent REST PR to update the route to use it.
Another option without the handler and Go binding is to handle a second type of relative URIs in FragmentConsolidator::consolidate_fragments that is relative to the array_uri/ directory. The relative URIs handled in consolidate_fragments linked above is currently relative to the array_uri/__fragments directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about reconstructing the absolute URIs in HandleConsolidateArrayRequest on REST Server side by prepeding the array URI to arrayConsolidationRequest.Fragments before calling ConsolidateFragments ? Would that be possible? Looks like the least "invasive" solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened this to track, we discussed and this is a feasible REST-Server side solution: https://app.shortcut.com/tiledb-inc/story/52679/fix-fragment-list-consolidation-handler-to-convert-relative-uris-to-absolute
Let's get this reviewed in the meantime when you have some spare time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done and I merged this branch to fix conflicts. I'll watch for CI failures but LGTM 👍
Co-authored-by: Shaun M Reed <[email protected]>
a4c8c71 to
65c1812
Compare
|
@KiterLuc this is now ready for merging. |
[sc-48774]
This PR implements the needed changes on Core side for supporting fragment list cloud consolidation for
tiledb://arrays.In order for this feature to work end-to-end changes are also needed on the REST server side to deserialize the capnp request and call the right method (
consolidate_fragments) for actually doing the consolidation.TYPE: FEATURE
DESC: Fragment list consolidation for tiledb cloud arrays