Skip to content

check_estimators check_estimators_pickle fails on Prescott architecture #27754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DerWeh opened this issue Nov 8, 2023 · 3 comments
Closed
Labels
Bug Needs Triage Issue requires triage

Comments

@DerWeh
Copy link
Contributor

DerWeh commented Nov 8, 2023

Describe the bug

On machines with the Prescott architecture, tests using check_estimator unexpectedly fail due to #23994, which forces aligned=True. To my mind, this makes little sense as an estimator cannot be aligned which only makes sense for simple arrays.

I have no idea what the intent of #23994, is. Maybe replacing

if has_prescott_openblas:
    aligned = True

by

if isinstance(data, np.ndarray) and data.flags.aligned and has_prescott_openblas:
    aligned = True

is sufficient to fix this? But as I said, I don't really know what the point of that PR was.

Steps/Code to Reproduce

from sklearn.ensemble import RandomForestClassifier
from sklearn.utils.estimator_checks import check_estimator

for estimator, check in check_estimator(RandomForestClassifier(), generate_only=True):
    if "check_estimators_pickle" not in check.func.__name__:
         continue
    check(estimator)

Expected Results

Passes

Actual Results

On the Prescott architecture, this results in the following error:

ValueError                                Traceback (most recent call last)
Cell In[2], line 13
      9 if "check_estimators_pickle" not in check.func.__name__:
     11      continue
---> 13 check(estimator)

File ~/micromamba/envs/RandomForestClassifier/lib/python3.10/site-packages/sklearn/utils/_testing.py:156, in _IgnoreWarnings.__call__.<locals>.wrapper(*args, **kwargs)
    154 with warnings.catch_warnings():
    155     warnings.simplefilter("ignore", self.category)
--> 156     return fn(*args, **kwargs)

File ~/micromamba/envs/RandomForestClassifier/lib/python3.10/site-packages/sklearn/utils/estimator_checks.py:2007, in check_estimators_pickle(name, estimator_orig, readonly_memmap)
   2004 estimator.fit(X, y)
   2006 if readonly_memmap:
-> 2007     unpickled_estimator = create_memmap_backed_data(estimator)
   2008 else:
   2009     # pickle and unpickle!
   2010     pickled_estimator = pickle.dumps(estimator)

File ~/micromamba/envs/RandomForestClassifier/lib/python3.10/site-packages/sklearn/utils/_testing.py:510, in create_memmap_backed_data(data, mmap_mode, return_folder, aligned)
    507     aligned = True
    509 if aligned:
--> 510     memmap_backed_data = _create_aligned_memmap_backed_arrays(
    511         data, mmap_mode, temp_folder
    512     )
    513 else:
    514     filename = op.join(temp_folder, "data.pkl")

File ~/micromamba/envs/RandomForestClassifier/lib/python3.10/site-packages/sklearn/utils/_testing.py:476, in _create_aligned_memmap_backed_arrays(data, mmap_mode, folder)
    466 if isinstance(data, Sequence) and all(
    467     isinstance(each, np.ndarray) for each in data
    468 ):
    469     return [
    470         _create_memmap_backed_array(
    471             array, op.join(folder, f"data{index}.dat"), mmap_mode
    472         )
    473         for index, array in enumerate(data)
    474     ]
--> 476 raise ValueError(
    477     "When creating aligned memmap-backed arrays, input must be a single array or a"
    478     " sequence of arrays"
    479 )

ValueError: When creating aligned memmap-backed arrays, input must be a single array or a sequence of arrays

Versions

System:
    python: 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
executable: ~/micromamba/envs/RandomForestClassifier/bin/python3.10
   machine: Linux-5.15.0-88-generic-x86_64-with-glibc2.31

Python dependencies:
      sklearn: 1.3.2
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.26.1
        scipy: 1.11.3
       Cython: 3.0.5
       pandas: None
   matplotlib: None
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 2
         prefix: libgomp
       filepath: ~/micromamba/envs/RandomForestClassifier/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 2
         prefix: libopenblas
       filepath: ~/micromamba/envs/RandomForestClassifier/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Prescott

       user_api: blas
   internal_api: openblas
    num_threads: 2
         prefix: libopenblas
       filepath: ~/micromamba/envs/RandomForestClassifier/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: Prescott
@DerWeh DerWeh added Bug Needs Triage Issue requires triage labels Nov 8, 2023
@glemaitre
Copy link
Member

Are you sure that you have a Prescott architecture. It is a old Intel Pentium D.
If this is not the case then you probably have the same issue as reported in #27613.

It should be solved in #27614 that is merged in main and in OpenBLAS the issue linked with: OpenMathLib/OpenBLAS#4268

@glemaitre
Copy link
Member

@DerWeh You can install the nightly build to check that we really solve the problem. But I am almost sure this is the case.

@DerWeh
Copy link
Contributor Author

DerWeh commented Nov 9, 2023

I forgot the drop the filter for open issues, so I overlooked #27614...

@glemaitre Sorry, I don't know how to check the architecture. But sklearn.utils.fixes.threadpool_info shows "Prescott". The machine in question is actually a CI server, so I have no access to install the nightly build.

However, the error can be reproduced on any architecture by monkey patching sklearn.utils.fixes.threadpool_info. The following code fails on any architecture:

# BEGIN: monkey patch architecture
import sklearn.utils.fixes


threadpool_info = sklearn.utils.fixes.threadpool_info


def hide_prescott():
    """Something is wrong in sklearn for Prescott architecture, let's hide it."""
    result = threadpool_info()
    for dict_ in result:
        if dict_["internal_api"] == "openblas":
            dict_["architecture"] = "Prescott"
    return result


sklearn.utils.fixes.threadpool_info = hide_prescott
# END: monkey patch architecture


from sklearn.ensemble import RandomForestClassifier
from sklearn.utils.estimator_checks import check_estimator


for estimator, check in check_estimator(RandomForestClassifier(), generate_only=True):
    if "check_estimators_pickle" not in check.func.__name__:
        continue
    check(estimator)

But as you say, on the nightly build is passes. Sorry for the trouble.

@DerWeh DerWeh closed this as completed Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

2 participants