Skip to content

implement normalize_token #3378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dcherian opened this issue Oct 7, 2019 · 2 comments · Fixed by #3446
Closed

implement normalize_token #3378

dcherian opened this issue Oct 7, 2019 · 2 comments · Fixed by #3446

Comments

@dcherian
Copy link
Contributor

dcherian commented Oct 7, 2019

See #3276 (comment)

@dcherian dcherian mentioned this issue Oct 7, 2019
4 tasks
@dcherian
Copy link
Contributor Author

How should this be implemented?

@crusaderky
Copy link
Contributor

crusaderky commented Oct 12, 2019

https://docs.dask.org/en/latest/custom-collections.html#implementing-deterministic-hashing

@normalize_token.register(Dataset)
def tokenize_dataset(ds):
    return Dataset, ds._variables, ds._coord_names, ds._attrs

@normalize_token.register(DataArray)
def tokenize_dataarray(da):
    return DataArray, ds._variable, ds._coords, ds._name

# Note: the @singledispatch for IndexVariable must be defined before the one for Variable
@normalize_token.register(IndexVariable)
def tokenize_indexvariable(v):
    # Don't waste time converting pd.Index to np.ndarray
    return IndexVariable, v._dims, v._data.array, v._attrs

@normalize_token.register(Variable)
def tokenize_variable(v):
    # Note: it's v.data, not v._data, in order to cope with the 
    # wrappers around NetCDF and the like
    return Variable, v._dims, v.data, v._attrs

You'll need to write a dummy normalize_token for when dask is not installed.

Unit tests:

  • running tokenize() twice on the same object returns the same result
  • changing the content of a data_var (or the variable, for DataArray) changes the output
  • changing the content of a coord changes the output
  • changing attrs, name, or dimension names change the output
  • whether a variable is a data_var or a coord changes the output
  • dask arrays aren't computed
  • non-numpy, non-dask NEP18 data is not converted to numpy
  • works with xarray's fancy wrappers around NetCDF and the like

@dcherian dcherian mentioned this issue Oct 25, 2019
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants