Skip to content

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Jul 7, 2025

Fix some multiprocessing issues.

The things I've noticed / fixed:

  • When batching based on the number of characters the order of the index and text was getting mixed up
  • When using Addons (e.g MetaCAT) in a multiprocessed way the addon data paths weren't set
    • Because it is done on init and depends on the tokenizer
    • And because of how the data is pickled for multiprocessing, this is omitted

This PR attempts to fix both issues.

@mart-r mart-r merged commit 01ad04c into main Jul 7, 2025
13 checks passed
@mart-r mart-r deleted the CU-8699py5m0-multiprocessing-issues branch July 7, 2025 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant