Skip to content

Conversation

mrocklin
Copy link
Member

@mrocklin mrocklin commented Apr 1, 2015

Add function to lazily apply a function onto a materialized dask array. Came up in conversation with @shoyer

I'm not sure how I feel about this yet. I'm afraid that people will assume it does something intelligent and misuse it. I kind of what to hide this some place. Thoughts?

blockdims = tuple((d,) for d in shape)

name = next(names)
dsk = {(name,) + (0,) * len(shape): (func, (rec_concatenate, (concrete, x._keys())))}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that you could compose functions in dask graphs by nesting tuples, but I guess that makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would another option be to do something like the following here?

consolidated = x.reblock(blockshape=x.shape)
dsk = {(name,) + (0,) * len(shape): (func, consolidated.name)}
return Array(merge(dsk, consolidated.dask), name, blockdims=blockdims, dtype=dtype)

That seems slightly preferable to using the private _keys method.

@shoyer
Copy link
Member

shoyer commented Apr 1, 2015

I should show you how far I got on lazy_apply for xray.Dataset objects. It's messier, because I need to store a function that creates an xray.Dataset in the dask.

@mrocklin
Copy link
Member Author

mrocklin commented Apr 1, 2015

@shoyer I've removed lazy_apply from the main namespace, so you'll need to dive more deeply into da.core.lazy_apply. Are you ok with this? Thoughts generally on this kind of thing? This is ready to merge from my perspective.

@shoyer
Copy link
Member

shoyer commented Apr 1, 2015

@mrocklin That's fine by me. This is more valuable to me as an example, anyways, because it doesn't quite do what I need (which will require multiple dask arrays as input). I don't think I'll be able to create a lazy_apply which works on xray.Dataset objects without actually manipulating the dask in xray (which is fine).

@mrocklin
Copy link
Member Author

mrocklin commented Apr 1, 2015

OK, if this isn't directly usable then my preference is not to merge it. If I can do something usable then let me know. We could do a lazy_apply that took many input arrays if that's better.

@shoyer
Copy link
Member

shoyer commented Apr 1, 2015

A version that took many input arrays and returned many output arrays would be directly usable.

I wrote a version of this function that acts directly on a lazy xray dataset as input. That works in terms of not blowing up my memory but the "dask within a dask" approach is obviously not ideal -- it kind of cripples the scheduler.

@mrocklin
Copy link
Member Author

mrocklin commented May 9, 2015

Stale. Closing. Please reopen on renewed interest.

@mrocklin mrocklin closed this May 9, 2015
@mrocklin mrocklin deleted the lazy-apply branch January 3, 2019 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants