feat: Use same `client_key` for `Actor` created `request_queue` and improve its metadata estimation #552

Pijukatel · 2025-08-19T11:22:04Z

Description

When creating RequestQueue from Actor on the platform or with force_cloud=True the client_key should be set to run_id. This ensures:
- Each API call of the same RequestQueue instance that is using ApifyRequestQueueClient will be done with the same client_key and thus in metadata had_multiple_clients=False
- Multiple instances of RequestQueue created by Actor.open_request_queue() on the platform share the same client_key and thus in metadata had_multiple_clients=False
- On the platform, since the client_key is set to run_id, it remains the same for resurrected or migrated run, and thus in metadata had_multiple_clients=False
Improved reliability of had_multiple_clients allows better estimation of RequestQueue metadata.
- When had_multiple_clients=False, it is possible to trust local estimation of the metadata.
- When had_multiple_clients=True, local estimation is no longer valid, but still can, in some cases, improve estimation by being ahead of the delayed API update of the metadata. Therefore API-based metadata are fused with local metadata estimation to produce as good estimation as we can.
ApifyRequestQueueClient init changed to properly initialize from full metadata - to enable more reliable metadata after migration/resurrection or when using existing RequestQueue
During _list_head, if there is a call to API, use the available had_multiple_clients to update local estimation of this value. This is a cheap way of knowing if there is another client or not without the need to make a new API call. _list_head is called frequently enough to make the local estimation of had_multiple_clients decently good.

Issues

Closes: Actor should use same client_key for all ApifyRequestQueue client calls #536

Testing

New tests added

Seems to be some problem on the platform?

Platform acknowledged it is a bug

Pijukatel · 2025-08-19T11:46:08Z

As of now, the test_request_queue_not_had_multiple_clients_platform_resurrection will fail due to an already acknowledged bug on the platform, but that should be fixed in the near future.

In develop branch: https://github.com/apify/apify-worker/pull/1470

vdusek · 2025-08-20T11:35:01Z

tests/integration/conftest.py

+@pytest.fixture(autouse=True)
+def set_token(apify_token: str, monkeypatch: pytest.MonkeyPatch) -> None:
+    monkeypatch.setenv(ApifyEnvVars.TOKEN, apify_token)


Why do we need this now? How did it work before without it?

vdusek · 2025-08-20T11:37:32Z

tests/integration/test_actor_request_queue.py

+async def test_request_queue_enhanced_metadata(
+    request_queue_force_cloud: RequestQueue,
+    apify_client_async: ApifyClientAsync,
+) -> None:
+    """Test metadata tracking.
+
+    Multiple clients scenarios are not guaranteed to give correct results without delay. But at least multiple clients,
+    single producer, should be reliable on the producer side."""
+
+    for i in range(1, 10):
+        await request_queue_force_cloud.add_request(Request.from_url(f'http://example.com/{i}'))
+        # Reliable information as the API response is enhanced with local metadata estimation.
+        assert (await request_queue_force_cloud.get_metadata()).total_request_count == i
+
+    # Accessed with client created explicitly with `client_key=None` should appear as distinct client
+    api_client = apify_client_async.request_queue(request_queue_id=request_queue_force_cloud.id, client_key=None)
+    await api_client.list_head()
+
+    # The presence of another non-producing client should not affect the metadata
+    for i in range(10, 20):
+        await request_queue_force_cloud.add_request(Request.from_url(f'http://example.com/{i}'))
+        # Reliable information as the API response is enhanced with local metadata estimation.
+        assert (await request_queue_force_cloud.get_metadata()).total_request_count == i
+
+
+async def test_request_queue_metadata_another_client(
+    request_queue_force_cloud: RequestQueue,
+    apify_client_async: ApifyClientAsync,
+) -> None:
+    """Test metadata tracking. The delayed metadata should be reliable even when changed by another client."""
+    api_client = apify_client_async.request_queue(request_queue_id=request_queue_force_cloud.id, client_key=None)
+    await api_client.add_request(Request.from_url('http://example.com/1').model_dump(by_alias=True, exclude={'id'}))
+
+    # Wait to be sure that the API has updated the global metadata
+    await asyncio.sleep(10)
+
+    assert (await request_queue_force_cloud.get_metadata()).total_request_count == 1
+
+
+async def test_request_queue_had_multiple_clients_local(
+    request_queue_force_cloud: RequestQueue,
+    apify_client_async: ApifyClientAsync,
+) -> None:
+    """Test that `RequestQueue` correctly detects multiple clients.
+
+    Clients created with different `client_key` should appear as distinct clients."""
+    await request_queue_force_cloud.fetch_next_request()
+
+    # Accessed with client created explicitly with `client_key=None` should appear as distinct client
+    api_client = apify_client_async.request_queue(request_queue_id=request_queue_force_cloud.id, client_key=None)
+    await api_client.list_head()
+
+    # Check that it is correctly in the RequestQueueClient metadata
+    assert (await request_queue_force_cloud.get_metadata()).had_multiple_clients is True
+
+    # Check that it is correctly in the API
+    api_response = await api_client.get()
+    assert api_response
+    assert api_response['hadMultipleClients'] is True
+
+
+async def test_request_queue_not_had_multiple_clients_local(
+    request_queue_force_cloud: RequestQueue, apify_client_async: ApifyClientAsync
+) -> None:
+    """Test that same `RequestQueue` created from Actor does not act as multiple clients."""
+
+    # Two calls to API to create situation where different `client_key` can set `had_multiple_clients` to True
+    await request_queue_force_cloud.fetch_next_request()
+    await request_queue_force_cloud.fetch_next_request()
+
+    # Check that it is correctly in the RequestQueueClient metadata
+    assert (await request_queue_force_cloud.get_metadata()).had_multiple_clients is False
+
+    # Check that it is correctly in the API
+    api_client = apify_client_async.request_queue(request_queue_id=request_queue_force_cloud.id)
+    api_response = await api_client.get()
+    assert api_response
+    assert api_response['hadMultipleClients'] is False


Should these be here? We have test_actor_request_queue.py and test_request_queue.py btw.

vdusek · 2025-08-20T11:47:44Z

src/apify/storage_clients/_apify/_request_queue_client.py

+    async def _get_metadata(self) -> RequestQueueMetadata:
+        """Try to get cached metadata first. If multiple clients, fuse with global metadata."""
+        if self._metadata.had_multiple_clients:
+            return await self.get_metadata()
+        # Get local estimation (will not include changes done bo another client)
+        return self._metadata
+
    @override
    async def get_metadata(self) -> RequestQueueMetadata:
-        total_count = self._initial_total_count + self._assumed_total_count
-        handled_count = self._initial_handled_count + self._assumed_handled_count
-        pending_count = total_count - handled_count
-
+        """Get metadata about the request queue."""
+        response = await self._api_client.get()
+        if response is None:
+            raise ValueError('Failed to fetch request queue metadata from the API.')


It should be better explained why we have _get_metadata and get_metadata now.

Pijukatel added 4 commits August 16, 2025 13:32

Draft, unique key enables better metadata handling

b159d3e

Add more tests

301e294

Add resurrection test

281c67f

Seems to be some problem on the platform?

Remove internal debug logs from the test

efd9e53

Platform acknowledged it is a bug

github-actions bot assigned Pijukatel Aug 19, 2025

github-actions bot added this to the 121st sprint - Tooling team milestone Aug 19, 2025

github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Aug 19, 2025

Properly update generate_unique_resource_name

2b2cc30

Pijukatel requested review from vdusek and janbuchar August 19, 2025 11:50

vdusek requested changes Aug 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Use same `client_key` for `Actor` created `request_queue` and improve its metadata estimation #552

feat: Use same `client_key` for `Actor` created `request_queue` and improve its metadata estimation #552

Uh oh!

Pijukatel commented Aug 19, 2025 •

edited

Loading

Uh oh!

Pijukatel commented Aug 19, 2025 •

edited

Loading

Uh oh!

vdusek Aug 20, 2025

Uh oh!

vdusek Aug 20, 2025

Uh oh!

vdusek Aug 20, 2025

Uh oh!

Uh oh!

feat: Use same client_key for Actor created request_queue and improve its metadata estimation #552

Are you sure you want to change the base?

feat: Use same client_key for Actor created request_queue and improve its metadata estimation #552

Uh oh!

Conversation

Pijukatel commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Testing

Uh oh!

Pijukatel commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vdusek Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

vdusek Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

vdusek Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

feat: Use same `client_key` for `Actor` created `request_queue` and improve its metadata estimation #552

feat: Use same `client_key` for `Actor` created `request_queue` and improve its metadata estimation #552

Pijukatel commented Aug 19, 2025 •

edited

Loading

Pijukatel commented Aug 19, 2025 •

edited

Loading