Lazy apply #122

mrocklin · 2015-04-01T05:00:40Z

Add function to lazily apply a function onto a materialized dask array. Came up in conversation with @shoyer

I'm not sure how I feel about this yet. I'm afraid that people will assume it does something intelligent and misuse it. I kind of what to hide this some place. Thoughts?

shoyer · 2015-04-01T05:57:43Z

dask/array/core.py

+    blockdims = tuple((d,) for d in shape)
+
+    name = next(names)
+    dsk = {(name,) + (0,) * len(shape): (func, (rec_concatenate, (concrete, x._keys())))}


I didn't realize that you could compose functions in dask graphs by nesting tuples, but I guess that makes sense.

Would another option be to do something like the following here?

consolidated = x.reblock(blockshape=x.shape) dsk = {(name,) + (0,) * len(shape): (func, consolidated.name)} return Array(merge(dsk, consolidated.dask), name, blockdims=blockdims, dtype=dtype)

That seems slightly preferable to using the private _keys method.

shoyer · 2015-04-01T06:01:51Z

I should show you how far I got on lazy_apply for xray.Dataset objects. It's messier, because I need to store a function that creates an xray.Dataset in the dask.

mrocklin · 2015-04-01T15:22:54Z

@shoyer I've removed lazy_apply from the main namespace, so you'll need to dive more deeply into da.core.lazy_apply. Are you ok with this? Thoughts generally on this kind of thing? This is ready to merge from my perspective.

shoyer · 2015-04-01T17:06:33Z

@mrocklin That's fine by me. This is more valuable to me as an example, anyways, because it doesn't quite do what I need (which will require multiple dask arrays as input). I don't think I'll be able to create a lazy_apply which works on xray.Dataset objects without actually manipulating the dask in xray (which is fine).

mrocklin · 2015-04-01T17:28:46Z

OK, if this isn't directly usable then my preference is not to merge it. If I can do something usable then let me know. We could do a lazy_apply that took many input arrays if that's better.

shoyer · 2015-04-01T18:25:45Z

A version that took many input arrays and returned many output arrays would be directly usable.

I wrote a version of this function that acts directly on a lazy xray dataset as input. That works in terms of not blowing up my memory but the "dask within a dask" approach is obviously not ideal -- it kind of cripples the scheduler.

mrocklin · 2015-05-09T01:11:16Z

Stale. Closing. Please reopen on renewed interest.

mrocklin added 2 commits March 31, 2015 21:59

move concrete to utils

6373059

lazy_apply function

ab48e70

shoyer reviewed Apr 1, 2015
View reviewed changes

mrocklin added 2 commits April 1, 2015 08:09

skip doctest

5220b28

remove lazy_apply from da namespace

e75d8f0

lazy-apply support multiple inputs

d30c1c7

shoyer mentioned this pull request Apr 17, 2015

Checklist for releasing a version of xray with dask support pydata/xarray#394

Closed

8 tasks

mrocklin closed this May 9, 2015

mrocklin deleted the lazy-apply branch January 3, 2019 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Lazy apply #122

Lazy apply #122

Uh oh!

mrocklin commented Apr 1, 2015

Uh oh!

shoyer Apr 1, 2015

Uh oh!

shoyer Apr 1, 2015

Uh oh!

shoyer commented Apr 1, 2015

Uh oh!

mrocklin commented Apr 1, 2015

Uh oh!

shoyer commented Apr 1, 2015

Uh oh!

mrocklin commented Apr 1, 2015

Uh oh!

shoyer commented Apr 1, 2015

Uh oh!

mrocklin commented May 9, 2015

Uh oh!

Uh oh!

Uh oh!

Lazy apply #122

Lazy apply #122

Uh oh!

Conversation

mrocklin commented Apr 1, 2015

Uh oh!

shoyer Apr 1, 2015

Choose a reason for hiding this comment

Uh oh!

shoyer Apr 1, 2015

Choose a reason for hiding this comment

Uh oh!

shoyer commented Apr 1, 2015

Uh oh!

mrocklin commented Apr 1, 2015

Uh oh!

shoyer commented Apr 1, 2015

Uh oh!

mrocklin commented Apr 1, 2015

Uh oh!

shoyer commented Apr 1, 2015

Uh oh!

mrocklin commented May 9, 2015

Uh oh!

Uh oh!