Skip to content

Conversation

danpoletaev
Copy link
Contributor

This PR is part of Issue #19363, which updates KeyValueStore.getPublicUrl(recordKey) to generate signed links using HMAC.

This PR:

  • creates signature and appends it to the public URL of the record in KV store

P.S. Before merging, we need to wait for the release of Crawlee, so we can update version of Crawlee.

P.P.S It's hard to test the changes, since master branch of SDK and master branch of Crawlee are out of sync. Crawlee has some breaking changes in version 6.0, which are not yet addressed in master branch of SDK.

Previous PR that adds storageObject to Crawlee Python is here

Same PR in SDK JS is here

@danpoletaev danpoletaev self-assigned this Feb 19, 2025
@github-actions github-actions bot added this to the 109th sprint - Platform team milestone Feb 19, 2025
@github-actions github-actions bot added t-core-services Issues with this label are in the ownership of the core services team. tested Temporary label used only programatically for some analytics. labels Feb 19, 2025
public_api_url = self._api_public_base_url
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}'

url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None) # type: ignore[attr-defined]
Copy link
Contributor Author

@danpoletaev danpoletaev Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that url_signing_secret_key isn't being properly transformed from camelCase? In the API response, it shows up as urlSigningSecretKey

@vdusek
Copy link
Contributor

vdusek commented Feb 24, 2025

@danpoletaev sorry, we've been asked to review, but the CI is broken, could you please fix it first?

@vdusek vdusek requested review from Pijukatel and removed request for vdusek February 25, 2025 15:20
@danpoletaev
Copy link
Contributor Author

danpoletaev commented Feb 25, 2025

@danpoletaev sorry, we've been asked to review, but the CI is broken, could you please fix it first?

I'm afraid that I won't be able to fix CI before the release of Crawlee 0.6 version and updating SDK to deal with breaking changes, but I'll not merge it before that

return input_data


CHARSET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably use predefined standard string constants: string.digits+string.ascii_letters
https://docs.python.org/3/library/string.html#string-constants

CHARSET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'


def encode_base62(num: int) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method alone is perfect for writing unit test for. Could you please add one parametrized test with couple of input values.

https://docs.pytest.org/en/stable/how-to/parametrize.html#pytest-mark-parametrize

public_api_url = self._api_public_base_url
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}'

url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None) # type: ignore[attr-defined]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the ignore. I guess it is complaining about self.storage_object in such case we should not ignore it as it could raise an exception in cases where self does not have storage_object

I guess you can remove it and wait for your other change in crawlee to be merged and that should silence the warning.

assert char in 'abcdefghijklmnopqrstuvwxyzABCEDFGHIJKLMNOPQRSTUVWXYZ0123456789'


# Check if the method is compatible with js version of the same method in:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what is the comment telling the reader. Could you rephrase it please. Currently it looks like a remainder to the author of the change to do something

# This test uses the same secret key and message as in JS tests.
secret_key = 'hmac-same-secret-key'
message = 'hmac-same-message-to-be-authenticated'
for _ in range(5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why stress testing it? Is there some non-determinism involved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no non-determinism involved, but I'd rather be sure, that for same input we get same output.
Refactored test to avoid "stress testing"

@B4nan
Copy link
Member

B4nan commented Mar 4, 2025

@vdusek please hold the SDK release till this one is merged, it was blocked by crawlee release (which we will first need to do once apify/crawlee-python#993 is merged)

Comment on lines 95 to 99
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}'

url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None)
if url_signing_secret_key:
public_url += f'?signature={create_hmac_signature(url_signing_secret_key, key)}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use yarl.URL to build the URL? It is already a dependency of Crawlee.

public_api_url = self._api_public_base_url
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}'

url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, there is no guarantee if there is an attribute called url_signing_secret_key on the storage object that it will actually be what you expect. Could you use an isinstance check instead to make sure that the storage_object is of the correct type?


assert record_url == f'{public_api_url}/v2/key-value-stores/{default_store_id}/records/dummy'
record_key = 'dummy'
record_url = await cast(KeyValueStoreClient, store._resource_client).get_public_url(record_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I don't think we need to access KeyValueStoreClient._resource_client anymore - just KeyValueStore.get_public_url should be enough.

@danpoletaev danpoletaev requested a review from janbuchar March 7, 2025 14:37
Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vdusek vdusek changed the title feat: sign public url feat: add signing of public URL Mar 7, 2025
@vdusek vdusek merged commit a865461 into master Mar 7, 2025
28 checks passed
@vdusek vdusek deleted the feat/sign-public-url branch March 7, 2025 15:58
@fnesveda fnesveda added the validated Issues that are resolved and their solutions fulfill the acceptance criteria. label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-core-services Issues with this label are in the ownership of the core services team. tested Temporary label used only programatically for some analytics. validated Issues that are resolved and their solutions fulfill the acceptance criteria.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants