Skip to content

[Data] Improve Download operator display name in progress bars #57732

@bveeramani

Description

@bveeramani

Summary

The Download operator currently shows up as "URIDownloader" in progress bars. This should be updated to show "Download[uri_column_name]" to make it clearer which column is being downloaded.

Current Behavior

When using the Download operator, progress bars display:

URIDownloader

Desired Behavior

Progress bars should display:

Download(uri_column_name)

Where uri_column_name is the user-specified column being downloaded.

Motivation

The current "URIDownloader" name:

  • Doesn't clearly communicate what's happening in the operation
  • Makes it harder to debug or understand which column is being processed
  • Doesn't follow the naming pattern of other operators that show their parameters

The proposed change would:

  • Make progress bars more informative and self-documenting
  • Help users quickly identify which column is being downloaded
  • Improve the debugging experience when multiple Download operations are running
  • Align with naming conventions used by other Ray Data operators (e.g., Map(fn_name))

Example

If downloading from column "image_url", the progress bar would show:

Download(image_url)

If there are multiple URIs specified, the progress bar could show:

Download(image_url, thumbnail_url)

(Note that multiple URIs isn't currently exposed as a user-facing API yet, but it's still something you should support for internal correctness).

Implementation Sketch

Files to Modify

Primary file: /python/ray/data/_internal/planner/plan_download_op.py

Changes Required

  1. Update the download operator name (Line 131 in plan_download_op.py):

    # Current code (line 131):
    name="URIDownloader",
    
    # Should be changed to:
    name=f"Download[{op.uri_column_name}]",
  2. Optionally update the partition operator name (Line 89 in plan_download_op.py) for consistency:

    # Current code (line 89):
    name="URIPartitioner",
    
    # Could be changed to:
    name=f"Partition[{uri_column_name}]",

Code Context

The relevant section in plan_download_op.py (lines 127-134):

download_map_operator = MapOperator.create(
    download_map_transformer,
    partition_map_operator if partition_map_operator else input_physical_dag,
    data_context,
    name="URIDownloader",  # <- Change this line
    compute_strategy=download_compute,
    ray_remote_args=ray_remote_args,
)

Testing

Should add/update tests in /python/ray/data/tests/test_download_expression.py to verify the progress bar displays the correct operator name format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to Ray

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions