Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,14 @@ std::string getProperFilePathFromMetadataInfo(std::string_view data_path, std::s
}
else
{
throw ::DB::Exception(DB::ErrorCodes::BAD_ARGUMENTS, "Expected to find '{}' in data path: '{}'", common_path, data_path);
/// Data files can have different path
pos = data_path.find("://");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method docs say:

// For example, if the full blob path is s3://bucket/table_name/data/00000-1-1234567890.avro, the function will return table_name/data/00000-1-1234567890.avro

Which means the bucket shall not be included in the return value as far as I understand.

As far as I understand your exception handling,you skip the initial "://" and then copy everything after the next "/". But what if you are using Path-style URIs (i.e, https://s3.region-code.amazonaws.com/bucket-name/key-name)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to remove bucket

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same S3/URI comment here

if (pos == std::string::npos)
throw ::DB::Exception(DB::ErrorCodes::BAD_ARGUMENTS, "Unexpected data path: '{}'", data_path);
pos = data_path.find("/", pos + 3);
if (pos == std::string::npos)
throw ::DB::Exception(DB::ErrorCodes::BAD_ARGUMENTS, "Unexpected data path: '{}'", data_path);
return std::string(data_path.substr(pos));
}
}

Expand Down
Loading