Skip to content

SELECT fails on pre-signed URLs due to CORS errors in DuckDB-Wasm #1852

@coji

Description

@coji

What happens?

When executing a SELECT statement in DuckDB-Wasm on a data source accessed via a pre-signed URL (especially those created for GET requests), the operation fails due to CORS errors. This prevents querying data stored in locations that require pre-signed URLs for access.

To Reproduce

  1. Use the following pre-signed URL for a Parquet file (valid for 7 days from 2024-09-14):
    https://91ff95bcb91fbfa1b1c5c356262b1fe4.r2.cloudflarestorage.com/techtalk/world_populations.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=0d9126cf0fed3ae3c00f20ceb2bb97c3%2F20240914%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20240914T091120Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=1bddf8fcc77e83aa20ffa827e771cea7310af373354af06c5ac58f2e181f0182
    
  2. In DuckDB-Wasm or at shell.duckdb.org, attempt to execute a SELECT statement on this data source using the pre-signed URL.

Example SQL query:

SELECT * FROM parquet_scan('https://91ff95bcb91fbfa1b1c5c356262b1fe4.r2.cloudflarestorage.com/techtalk/world_populations.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=0d9126cf0fed3ae3c00f20ceb2bb97c3%2F20240914%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20240914T091120Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=1bddf8fcc77e83aa20ffa827e771cea7310af373354af06c5ac58f2e181f0182') LIMIT 10;
  1. Observe that the query fails due to a CORS error, and the data is not accessible.

Note: I tried this query on shell.duckdb.org, and it failed to access the data.

Additional context:
The current behavior seems to be:

  1. DuckDB-Wasm attempts a HEAD request on the pre-signed URL.
  2. The HEAD request fails with a CORS error.
  3. An exception is thrown by xhr.send(null), which is not caught.
  4. The code for performing a range GET request is never reached.
  5. The SELECT statement fails, unable to access the data.

This behavior was observed both in a local DuckDB-Wasm implementation and on shell.duckdb.org.

Importantly, the bucket's CORS policy is set according to the documentation:

[
  {
    "AllowedOrigins": [
      "*"
    ],
    "AllowedMethods": [
      "GET",
      "HEAD"
    ],
    "AllowedHeaders": [
      "*"
    ],
    "ExposeHeaders": [
      "*"
    ],
    "MaxAgeSeconds": 3000
  }
]

Despite this CORS policy allowing both GET and HEAD methods from any origin, the issue persists. This suggests that the problem might be related to how DuckDB-Wasm handles the pre-signed URLs rather than the bucket's CORS configuration.

A possible solution might be to skip the HEAD request for pre-signed URLs or implement exception handling to proceed with the range GET request even if the HEAD request fails.

Browser/Environment:

Chrome 128.0.6613.138

Device:

M2 Macbook Air

DuckDB-Wasm Version:

1.28.1-dev278.0

DuckDB-Wasm Deployment:

shell.duckdb.org

Full Name:

Koji Mizoguchi

Affiliation:

TechTalk Inc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions