Description
The whole discussion around to_array
is quite tricky, see #294 and #307 . One big difficulty is that for some libraries it can stay lazy (e.g. Dask has a lazy array), whereas for others it can't (polars LazyFrame doesn't have a to_numpy
attribute)
Maybe we can temporarily park it, and try to address the more important (arguably) issue of what to do about
df: DataFrame
features = []
for column_name in df.column_names:
if df.col(column_name).std() > 0:
features.append(column_name)
return features
Because as far as I can tell, this call is problematic for all libraries other than purely eager ones. Even Dask, which was mentioned in #294 as an example of a library which can stay lazy in to_array
, raises in the call above (see here).
Dask raises here, it doesn't do any implicit computation.
So...what do we do here? Maybe let's try resolving this one, and then return to to_array
?
I'll hold off making suggestions this time, let's let the discussion roll