[Data] Fixed `max_task_in_flight_per_actor` to be defined by `max_concurrency` set by default #57772

alexeykudinkin · 2025-10-16T00:21:15Z

Description

Revisited the way max_task_in_flight_per_actor is determined by default:

Previously: it was statically set to 4
Now: it's determined as max_concurrency * DEFAULT_ACTOR_MAX_TASKS_IN_FLIGHT_TO_MAX_CONCURRENCY_FACTOR

Also, cleaned up validation inside the ActorPool verifying whether provided configuration is valid. Added test to assert that warning is logged properly with invalid configuration.

Related issues

Types of change

Checklist

Does this PR introduce breaking changes?

Yes ⚠️
No

Testing:

Added/updated tests for my changes
Tested the changes manually
This PR is not tested ❌ (please explain why)

Code Quality:

Signed off every commit (git commit -s)
Ran pre-commit hooks (setup guide)

Documentation:

Updated documentation (if applicable) (contribution guide)
Added new APIs to doc/source/ (if applicable)

Additional context

gemini-code-assist

Code Review

This pull request updates the default calculation for max_task_in_flight_per_actor to be based on max_concurrency, which is a sensible improvement. The changes also include better validation for the actor pool configuration and corresponding tests. The implementation looks solid. I have a couple of suggestions to improve the documentation and test code quality.

gemini-code-assist · 2025-10-16T00:22:09Z

python/ray/data/_internal/actor_autoscaler/default_actor_autoscaler.py

+        """Validate autoscaling configuration.
+
+        Args:
+            autoscaling_config: The autoscaling configuration to validate.
+            max_actor_concurrency: The maximum concurrency per actor.
+            max_tasks_in_flight_per_actor: The maximum tasks in flight per actor.
+        """


The docstring for this function appears to be incorrect. The Args section lists autoscaling_config, max_actor_concurrency, and max_tasks_in_flight_per_actor, but these are not parameters of the function. The function's actual parameters are actor_pool and op. Please update the docstring to accurately describe the function's parameters.

"""Validate autoscaling configuration for an actor pool. Args: actor_pool: The actor pool to validate. op: The physical operator that owns the actor pool. """

gemini-code-assist · 2025-10-16T00:22:10Z

python/ray/data/tests/test_autoscaler.py

+def test_autoscaling_config_validation_warnings(
+    ray_start_10_cpus_shared, restore_data_context
+):
+    """Test that validation warnings are emitted when actor pool config won't allow scaling up."""
+    from unittest.mock import patch
+
+    class SimpleMapper:
+        """Simple callable class for testing autoscaling validation."""
+
+        def __call__(self, row):
+            # Map operates on rows which are dicts
+            return {"value": row["id"] * 2}
+
+    # Test #1: Invalid config (should warn)
+    #   - max_tasks_in_flight / max_concurrency == 1
+    #   - Default upscaling threshold (200%)
+    with patch(
+        "ray.data._internal.actor_autoscaler.default_actor_autoscaler.logger.warning"
+    ) as mock_warning:
+        ds = ray.data.range(2, override_num_blocks=2).map_batches(
+            SimpleMapper,
+            compute=ray.data.ActorPoolStrategy(
+                max_tasks_in_flight_per_actor=1,
+            ),
+            max_concurrency=1,
+        )
+        # Take just one item to minimize execution time
+        ds.take_all()
+
+    # Check that warning was called with expected message
+    wanr_log_args_str = str(mock_warning.call_args_list)
+    expected_message = (
+        "⚠️  Actor Pool configuration of the "
+        "ActorPoolMapOperator[MapBatches(SimpleMapper)] will not allow it to scale up: "
+        "configured utilization threshold (200.0%) couldn't be reached with "
+        "configured max_concurrency=1 and max_tasks_in_flight_per_actor=1 "
+        "(max utilization will be max_tasks_in_flight_per_actor / max_concurrency = 100%)"
+    )
+
+    assert expected_message in wanr_log_args_str
+
+    # Test #2: Provided config is valid (no warnings)
+    #   - max_tasks_in_flight / max_concurrency == 2 (default)
+    #   - Default upscaling threshold (200%)
+    with patch(
+        "ray.data._internal.actor_autoscaler.default_actor_autoscaler.logger.warning"
+    ) as mock_warning:
+        ds = ray.data.range(2, override_num_blocks=2).map_batches(
+            SimpleMapper,
+            compute=ray.data.ActorPoolStrategy(
+                max_tasks_in_flight_per_actor=2,
+            ),
+            max_concurrency=1,
+        )
+        ds.take_all()
+
+    # Check that this warning hasn't been emitted
+    wanr_log_args_str = str(mock_warning.call_args_list)
+    expected_message = (
+        "⚠️  Actor Pool configuration of the "
+        "ActorPoolMapOperator[MapBatches(SimpleMapper)] will not allow it to scale up: "
+    )
+
+    assert expected_message not in wanr_log_args_str
+
+    # Test #3: Default config is valid (no warnings)
+    #   - max_tasks_in_flight / max_concurrency == 4 (default)
+    #   - Default upscaling threshold (200%)
+    with patch(
+        "ray.data._internal.actor_autoscaler.default_actor_autoscaler.logger.warning"
+    ) as mock_warning:
+        ds = ray.data.range(2, override_num_blocks=2).map_batches(
+            SimpleMapper, compute=ray.data.ActorPoolStrategy()
+        )
+        ds.take_all()
+
+    # Check that this warning hasn't been emitted
+    wanr_log_args_str = str(mock_warning.call_args_list)
+    expected_message = (
+        "⚠️  Actor Pool configuration of the "
+        "ActorPoolMapOperator[MapBatches(SimpleMapper)] will not allow it to scale up: "
+    )
+
+    assert expected_message not in wanr_log_args_str
+
+


This test function contains three separate test cases that are very similar in structure. To reduce code duplication and make the test more maintainable and readable, consider refactoring it to use pytest.mark.parametrize.

Additionally, there is a recurring typo in the variable name wanr_log_args_str, which should be warn_log_args_str.

Here is an example of how you could refactor this test:

@pytest.mark.parametrize( "actor_pool_strategy_kwargs, max_concurrency, should_warn", [ # Test #1: Invalid config (should warn) ({"max_tasks_in_flight_per_actor": 1}, 1, True), # Test #2: Provided config is valid (no warnings) ({"max_tasks_in_flight_per_actor": 2}, 1, False), # Test #3: Default config is valid (no warnings) ({}, None, False), ], ) def test_autoscaling_config_validation_warnings( ray_start_10_cpus_shared, restore_data_context, actor_pool_strategy_kwargs, max_concurrency, should_warn, ): # ... test logic ... with patch(...) as mock_warning: # ... warn_log_args_str = str(mock_warning.call_args_list) warning_message_prefix = ( "⚠️ Actor Pool configuration of the " "ActorPoolMapOperator[MapBatches(SimpleMapper)] will not allow it to scale up:" ) if should_warn: assert warning_message_prefix in warn_log_args_str else: assert warning_message_prefix not in warn_log_args_str

By default set max_tasks_in_flight_per_actor to be multiple of max_concurrency Signed-off-by: Alexey Kudinkin <[email protected]>

Signed-off-by: Alexey Kudinkin <[email protected]>

cursor · 2025-10-16T00:25:07Z

python/ray/data/_internal/actor_autoscaler/default_actor_autoscaler.py

-            == actor_pool.max_tasks_in_flight_per_actor()
-            and self._actor_pool_scaling_up_threshold > 1.0
+            max_tasks_in_flight_per_actor / max_concurrency
+            < self._actor_pool_scaling_up_threshold


Bug: Autoscaling Validation Fails on Zero Concurrency

The autoscaling configuration validation can raise a ZeroDivisionError when max_concurrency is 0. The logic divides by max_concurrency without a zero-check. While max_concurrency=0 is an invalid configuration, users can set it, causing the validation to crash instead of handling this edge case gracefully.

srinathk10

LGTM. Thanks!

alexeykudinkin requested a review from a team as a code owner October 16, 2025 00:21

alexeykudinkin added the go add ONLY when ready to merge, run all tests label Oct 16, 2025

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

alexeykudinkin added 8 commits October 15, 2025 17:22

Made DataContext.max_tasks_in_flight_per_actor an override config;

a2f08ac

By default set max_tasks_in_flight_per_actor to be multiple of max_concurrency Signed-off-by: Alexey Kudinkin <[email protected]>

Revisited _validate_actor_pool_autoscaling_config

50ce8e7

Signed-off-by: Alexey Kudinkin <[email protected]>

Fixed typo

65e002b

Signed-off-by: Alexey Kudinkin <[email protected]>

Added test

f81ba3c

Signed-off-by: Alexey Kudinkin <[email protected]>

Cleaned up test

160021a

Signed-off-by: Alexey Kudinkin <[email protected]>

Tidying up

cd385f4

Signed-off-by: Alexey Kudinkin <[email protected]>

Fixing test

f2791a6

Signed-off-by: Alexey Kudinkin <[email protected]>

lint

27b3412

Signed-off-by: Alexey Kudinkin <[email protected]>

alexeykudinkin force-pushed the ak/act-pl-tsk-pnd-fix branch from 124ef15 to 27b3412 Compare October 16, 2025 00:23

cursor bot reviewed Oct 16, 2025

View reviewed changes

ray-gardener bot added the data Ray Data-related issues label Oct 16, 2025

srinathk10 approved these changes Oct 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data] Fixed `max_task_in_flight_per_actor` to be defined by `max_concurrency` set by default #57772

[Data] Fixed `max_task_in_flight_per_actor` to be defined by `max_concurrency` set by default #57772

alexeykudinkin commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 16, 2025

Uh oh!

gemini-code-assist bot Oct 16, 2025

Uh oh!

cursor bot Oct 16, 2025

Uh oh!

srinathk10 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Data] Fixed max_task_in_flight_per_actor to be defined by max_concurrency set by default #57772

Are you sure you want to change the base?

[Data] Fixed max_task_in_flight_per_actor to be defined by max_concurrency set by default #57772

Conversation

alexeykudinkin commented Oct 16, 2025

Description

Related issues

Types of change

Checklist

Additional context

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Oct 16, 2025

Choose a reason for hiding this comment

Bug: Autoscaling Validation Fails on Zero Concurrency

Uh oh!

srinathk10 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Data] Fixed `max_task_in_flight_per_actor` to be defined by `max_concurrency` set by default #57772

[Data] Fixed `max_task_in_flight_per_actor` to be defined by `max_concurrency` set by default #57772