Describe the bug
When using audiofolder or imagefolder with directories for splits (train/test) rather than class labels, a spurious label column is incorrectly created.
Example: https://huggingface.co/datasets/datasets-examples/doc-audio-4
from datasets import load_dataset
ds = load_dataset("datasets-examples/doc-audio-4")
print(ds["train"].features)
Shows 'label' column with ClassLabel(names=['test', 'train']) - incorrect!## Root cause
In folder_based_builder.py, the labels set is accumulated across ALL splits (line 77). When directories are train/ and test/:
labels = {"train", "test"} → len(labels) > 1 → add_labels = True
- Spurious label column is created with split names as class labels
Expected behavior
No label column should be added when directory names match split names.
Proposed fix
Skip label inference when inferred labels match split names.
cc @lhoestq