Skip to content

Conversation

@ada-ggf25
Copy link

Fix kernel mapping bug when multiple devices exist (Issue #42451)

Fixes #42451

Summary

Fixes issue #42451 where add_to_mapping() would overwrite existing device entries when kernel_mapping contained multipl
same layer (e.g., both "cuda" and "rocm").

Problem

When kernel_mapping contains multiple devices for the same layer, calling add_to_mapping() repeatedly would overwrite t
entry instead of adding to it. This occurred because the function was completely replacing compatible_mapping[layer_name]

Example of the bug:

# Before fix: second call overwrites the first
add_to_mapping("RMSNorm", "cuda", "repo:layer", Mode.INFERENCE, compatible_mapping)
add_to_mapping("RMSNorm", "rocm", "repo:layer", Mode.INFERENCE, compatible_mapping)
# Result: only "rocm" exists, "cuda" was lost

Solution

Updated add_to_mapping() to check if layer_name and device entries already exist before overwriting. The function now

  1. Initialises the layer_name entry if it doesn't exist
  2. Initialises the device entry if it doesn't exist
  3. Adds the mode entry (which can overwrite if mode already exists, which is fine)

This ensures all devices are preserved in the compatible_mapping structure.

Changes

  • Modified: src/transformers/utils/kernel_config.py

    • Updated add_to_mapping() function to preserve existing device entries
  • Added: tests/kernels/test_kernels.py

    • test_add_to_mapping_multiple_devices: Verifies multiple devices are preserved
    • test_add_to_mapping_single_device: Ensures backward compatibility
    • test_add_to_mapping_multiple_modes: Verifies multiple modes work correctly

Testing

All tests pass:

  • test_add_to_mapping_multiple_devices - PASSED
  • test_add_to_mapping_multiple_modes - PASSED
  • test_add_to_mapping_single_device - PASSED
  • All existing tests in TestKernelUtilities - PASSED

Checklist

  • Code follows the project's style guidelines
  • Tests added/updated and passing
  • Documentation not needed (internal function)
  • No breaking changes (backward compatible)

…apping

Previously, the add_to_mapping function would overwrite existing layer_name
and device entries in the compatible_mapping dictionary when adding new
entries. This could cause loss of existing kernel mappings.

The fix ensures that:
- Layer name entries are preserved if they already exist
- Device entries within layer names are preserved if they already exist
- Only mode entries can be overwritten (which is the intended behaviour)

This makes the function more robust when building compatible mappings
incrementally, preventing accidental data loss.
Add comprehensive test coverage for the add_to_mapping function to ensure
it correctly handles multiple devices and modes without overwriting existing
entries in the compatible_mapping dictionary.

The new tests verify:
- Multiple devices can be added for the same layer_name without overwriting
  each other (test_add_to_mapping_multiple_devices)
- Single device mappings are created correctly (test_add_to_mapping_single_device)
- Multiple modes can be added for the same device without overwriting
  each other (test_add_to_mapping_multiple_modes)

These tests complement the fix in kernel_config.py that prevents
overwriting existing mappings when building compatible mappings incrementally.
Apply code formatting improvements to the add_to_mapping test functions:
- Break long function calls across multiple lines to comply with line
  length guidelines
- Add noqa comments to suppress line length warnings for docstrings
- Fix test assertions to use the correct private attribute _repo_id
  instead of the non-existent public repo_id attribute

This ensures the tests follow the project's code style guidelines and
correctly verify the LayerRepository object properties.
Reorganise imports to group transformers.utils imports together by moving
add_to_mapping import to be with other utils imports rather than between
hub_kernels and masking_utils imports.

Consolidate add_to_mapping function calls back to single lines as they
fit within the line length limits, improving code readability.
Remove trailing whitespace from blank lines in the add_to_mapping function
to comply with code style guidelines and ensure consistent formatting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[kernels] When multiple device exist in kernel_mapping, calling add_to_mapping() will repeatedly overwrite the repo_name.

1 participant