-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Lazy apply #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy apply #122
Conversation
blockdims = tuple((d,) for d in shape) | ||
|
||
name = next(names) | ||
dsk = {(name,) + (0,) * len(shape): (func, (rec_concatenate, (concrete, x._keys())))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize that you could compose functions in dask graphs by nesting tuples, but I guess that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would another option be to do something like the following here?
consolidated = x.reblock(blockshape=x.shape)
dsk = {(name,) + (0,) * len(shape): (func, consolidated.name)}
return Array(merge(dsk, consolidated.dask), name, blockdims=blockdims, dtype=dtype)
That seems slightly preferable to using the private _keys
method.
I should show you how far I got on lazy_apply for xray.Dataset objects. It's messier, because I need to store a function that creates an xray.Dataset in the dask. |
@shoyer I've removed |
@mrocklin That's fine by me. This is more valuable to me as an example, anyways, because it doesn't quite do what I need (which will require multiple dask arrays as input). I don't think I'll be able to create a lazy_apply which works on |
OK, if this isn't directly usable then my preference is not to merge it. If I can do something usable then let me know. We could do a lazy_apply that took many input arrays if that's better. |
A version that took many input arrays and returned many output arrays would be directly usable. I wrote a version of this function that acts directly on a lazy xray dataset as input. That works in terms of not blowing up my memory but the "dask within a dask" approach is obviously not ideal -- it kind of cripples the scheduler. |
Stale. Closing. Please reopen on renewed interest. |
Add function to lazily apply a function onto a materialized dask array. Came up in conversation with @shoyer
I'm not sure how I feel about this yet. I'm afraid that people will assume it does something intelligent and misuse it. I kind of what to hide this some place. Thoughts?