-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Description
Summary
The Download operator currently shows up as "URIDownloader" in progress bars. This should be updated to show "Download[uri_column_name]" to make it clearer which column is being downloaded.
Current Behavior
When using the Download operator, progress bars display:
URIDownloader
Desired Behavior
Progress bars should display:
Download(uri_column_name)
Where uri_column_name
is the user-specified column being downloaded.
Motivation
The current "URIDownloader" name:
- Doesn't clearly communicate what's happening in the operation
- Makes it harder to debug or understand which column is being processed
- Doesn't follow the naming pattern of other operators that show their parameters
The proposed change would:
- Make progress bars more informative and self-documenting
- Help users quickly identify which column is being downloaded
- Improve the debugging experience when multiple Download operations are running
- Align with naming conventions used by other Ray Data operators (e.g.,
Map(fn_name)
)
Example
If downloading from column "image_url", the progress bar would show:
Download(image_url)
If there are multiple URIs specified, the progress bar could show:
Download(image_url, thumbnail_url)
(Note that multiple URIs isn't currently exposed as a user-facing API yet, but it's still something you should support for internal correctness).
Implementation Sketch
Files to Modify
Primary file: /python/ray/data/_internal/planner/plan_download_op.py
Changes Required
-
Update the download operator name (Line 131 in
plan_download_op.py
):# Current code (line 131): name="URIDownloader", # Should be changed to: name=f"Download[{op.uri_column_name}]",
-
Optionally update the partition operator name (Line 89 in
plan_download_op.py
) for consistency:# Current code (line 89): name="URIPartitioner", # Could be changed to: name=f"Partition[{uri_column_name}]",
Code Context
The relevant section in plan_download_op.py
(lines 127-134):
download_map_operator = MapOperator.create(
download_map_transformer,
partition_map_operator if partition_map_operator else input_physical_dag,
data_context,
name="URIDownloader", # <- Change this line
compute_strategy=download_compute,
ray_remote_args=ray_remote_args,
)
Testing
Should add/update tests in /python/ray/data/tests/test_download_expression.py
to verify the progress bar displays the correct operator name format.