Skip to content

Commit 6207ee1

Browse files
authored
Enable @coiled.function. (#97)
1 parent 23f4758 commit 6207ee1

14 files changed

+227
-88
lines changed

docs/source/coiled.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# coiled
2+
3+
```{caution}
4+
Currently, the coiled backend can only be used if your workflow code is organized in a
5+
package due to how pytask imports your code and dask serializes task functions
6+
([issue](https://github.com/dask/distributed/issues/8607)).
7+
```
8+
9+
[coiled](https://www.coiled.io/) is a product built on top of dask that eases the
10+
deployment of your workflow to many cloud providers like AWS, GCP, and Azure.
11+
12+
Note that, coiled is a paid service. They offer a
13+
[free monthly tier](https://www.coiled.io/pricing) where you only need to pay the costs
14+
for your cloud provider and you can get started without a credit card.
15+
16+
They provide the following benefits which are especially helpful to people who are not
17+
familiar with cloud providers or remote computing.
18+
19+
- coiled manages your resources by spawning workers if you need them and shutting them
20+
down if they are idle.
21+
- [Synchronization](https://docs.coiled.io/user_guide/software/sync.html) of your local
22+
environment to remote workers.
23+
- [Adaptive scaling](https://docs.dask.org/en/latest/adaptive.html) if your workflow
24+
takes a long time to finish.
25+
26+
There are two ways how you can use coiled with pytask and pytask-parallel.
27+
28+
1. Run individual tasks in the cloud.
29+
1. Run your whole workflow in the cloud.
30+
31+
Both approaches are explained below after the setup.
32+
33+
## Setup
34+
35+
Follow coiled's
36+
[four step short process](https://docs.coiled.io/user_guide/setup/index.html) to set up
37+
your local environment and configure your cloud provider.
38+
39+
## Running individual tasks
40+
41+
In most projects there are a just couple of tasks that require a lot of resources and
42+
that you would like to run in a virtual machine in the cloud.
43+
44+
With coiled's
45+
[serverless functions](https://docs.coiled.io/user_guide/usage/functions/index.html),
46+
you can define the hardware and software environment for your task. Just decorate the
47+
task function with a {func}`@coiled.function <coiled.function>` decorator.
48+
49+
```{literalinclude} ../../docs_src/coiled/coiled_functions.py
50+
```
51+
52+
To execute the workflow, you need to turn on parallelization by requesting two or more
53+
workers or specifying one of the parallel backends. Otherwise, the decorated task is run
54+
locally.
55+
56+
```console
57+
pytask -n 2
58+
pytask --parallel-backend loky
59+
```
60+
61+
When you apply the {func}`@task <pytask.task>` decorator to the task, make sure the
62+
{func}`@coiled.function <coiled.function>` decorator is applied first, or is closer to
63+
the function. Otherwise, it will be ignored. Add more arguments to the decorator to
64+
configure the hardware and software environment.
65+
66+
```{literalinclude} ../../docs_src/coiled/coiled_functions_task.py
67+
```
68+
69+
```{seealso}
70+
Serverless functions are more thoroughly explained in
71+
[coiled's guide](https://docs.coiled.io/user_guide/usage/functions/index.html).
72+
```
73+
74+
(coiled-clusters)=
75+
76+
## Running a cluster
77+
78+
It is also possible to launch a cluster and run each task in a worker provided by
79+
coiled. Usually, it is not necessary and you are better off using coiled's serverless
80+
functions.
81+
82+
If you want to launch a cluster managed by coiled, register a function that builds an
83+
executor using {class}`coiled.Cluster`.
84+
85+
```python
86+
import coiled
87+
from pytask_parallel import ParallelBackend
88+
from pytask_parallel import registry
89+
from concurrent.futures import Executor
90+
91+
92+
def _build_coiled_executor(n_workers: int) -> Executor:
93+
return coiled.Cluster(n_workers=n_workers).get_client().get_executor()
94+
95+
96+
registry.register_parallel_backend(ParallelBackend.CUSTOM, _build_coiled_executor)
97+
```
98+
99+
Then, execute your workflow with
100+
101+
```console
102+
pytask --parallel-backend custom
103+
```

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@
8787
"coiled": ("https://docs.coiled.io/", None),
8888
"dask": ("https://docs.dask.org/en/stable/", None),
8989
"distributed": ("https://distributed.dask.org/en/stable/", None),
90+
"pytask": ("https://pytask-dev.readthedocs.io/en/stable/", None),
9091
"python": ("https://docs.python.org/3.10", None),
9192
}
9293

docs/source/custom_executors.md

Lines changed: 8 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -15,43 +15,19 @@ In some cases, adding a new backend can be as easy as registering a builder func
1515
that receives some arguments (currently only `n_workers`) and returns the instantiated
1616
executor.
1717

18-
```python
19-
from concurrent.futures import Executor
20-
from my_project.executor import CustomExecutor
21-
22-
from pytask_parallel import ParallelBackend, registry
23-
24-
25-
def build_custom_executor(n_workers: int) -> Executor:
26-
return CustomExecutor(max_workers=n_workers)
27-
28-
29-
registry.register_parallel_backend(ParallelBackend.CUSTOM, build_custom_executor)
18+
```{literalinclude} ../../docs_src/custom_executors.py
3019
```
3120

21+
Given {class}`pytask_parallel.WorkerType` pytask applies automatic wrappers around the
22+
task function to collect tracebacks, capture stdout/stderr and their like. The `remote`
23+
keyword allows pytask to handle local paths automatically for remote clusters.
24+
3225
Now, build the project requesting your custom backend.
3326

3427
```console
3528
pytask --parallel-backend custom
3629
```
3730

38-
Realistically, it is not the only necessary adjustment for a nice user experience. There
39-
are two other important things. pytask-parallel does not implement them by default since
40-
it seems more tightly coupled to your backend.
41-
42-
1. A wrapper for the executed function that captures warnings, catches exceptions and
43-
saves products of the task (within the child process!).
44-
45-
As an example, see
46-
[`def _execute_task()`](https://github.com/pytask-dev/pytask-parallel/blob/c441dbb75fa6ab3ab17d8ad5061840c802dc1c41/src/pytask_parallel/processes.py#L91-L155)
47-
that does all that for the processes and loky backend.
48-
49-
1. To apply the wrapper, you need to write a custom hook implementation for
50-
`def pytask_execute_task()`. See
51-
[`def pytask_execute_task()`](https://github.com/pytask-dev/pytask-parallel/blob/c441dbb75fa6ab3ab17d8ad5061840c802dc1c41/src/pytask_parallel/processes.py#L41-L65)
52-
for an example. Use the
53-
[`hook_module`](https://pytask-dev.readthedocs.io/en/stable/how_to_guides/extending_pytask.html#using-hook-module-and-hook-module)
54-
configuration value to register your implementation.
55-
56-
Another example of an implementation can be found as a
57-
[test](https://github.com/pytask-dev/pytask-parallel/blob/c441dbb75fa6ab3ab17d8ad5061840c802dc1c41/tests/test_backends.py#L35-L78).
31+
```{important}
32+
pytask applies automatic wrappers
33+
```

docs/source/dask.md

Lines changed: 9 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -81,48 +81,17 @@ You can find more information in the documentation for
8181
[`dask.distributed`](https://distributed.dask.org/en/stable/).
8282
```
8383

84-
## Remote - Using cloud providers with coiled
84+
## Remote
85+
86+
You can learn how to deploy your tasks to a remote dask cluster in [this
87+
guide](https://docs.dask.org/en/stable/deploying.html). They recommend to use coiled for
88+
deployment to cloud providers.
8589

8690
[coiled](https://www.coiled.io/) is a product built on top of dask that eases the
8791
deployment of your workflow to many cloud providers like AWS, GCP, and Azure.
8892

89-
They offer a [free monthly tier](https://www.coiled.io/pricing) where you only
90-
need to pay the costs for your cloud provider and you can get started without a credit
91-
card.
92-
93-
Furthermore, they offer the following benefits which are especially helpful to people
94-
who are not familiar with cloud providers or remote computing.
95-
96-
- A [four step short process](https://docs.coiled.io/user_guide/setup/index.html) to set
97-
up your local environment and configure your cloud provider.
98-
- coiled manages your resources by spawning workers if you need them and shutting them
99-
down if they are idle.
100-
- Synchronization of your local environment to remote workers.
101-
102-
So, how can you run your pytask workflow on a cloud infrastructure with coiled?
103-
104-
1. Follow their [guide on getting
105-
started](https://docs.coiled.io/user_guide/setup/index.html) by creating a coiled
106-
account and syncing it with your cloud provider.
107-
108-
1. Register a function that builds an executor using {class}`coiled.Cluster`.
109-
110-
```python
111-
import coiled
112-
from pytask_parallel import ParallelBackend
113-
from pytask_parallel import registry
114-
from concurrent.futures import Executor
115-
116-
117-
def _build_coiled_executor(n_workers: int) -> Executor:
118-
return coiled.Cluster(n_workers=n_workers).get_client().get_executor()
119-
120-
121-
registry.register_parallel_backend(ParallelBackend.CUSTOM, _build_coiled_executor)
122-
```
123-
124-
1. Execute your workflow with
93+
If you want to run the tasks in your project on a cluster managed by coiled read
94+
{ref}`this guide <coiled-clusters>`.
12595

126-
```console
127-
pytask --parallel-backend custom
128-
```
96+
Otherwise, follow the instructions in [dask's
97+
guide](https://docs.dask.org/en/stable/deploying.html).

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ pytask-parallel allows to execute workflows defined with
2323
maxdepth: 1
2424
---
2525
quickstart
26+
coiled
2627
dask
2728
custom_executors
2829
developers_guide

docs/source/quickstart.md

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ $ conda install -c conda-forge pytask-parallel
1818
When the plugin is only installed and pytask executed, the tasks are not run in
1919
parallel.
2020

21-
For parallelization with the default backend [loky](https://loky.readthedocs.io/), you need to launch multiple workers.
21+
For parallelization with the default backend [loky](https://loky.readthedocs.io/), you
22+
need to launch multiple workers or specify the parallel backend explicitly. Here, is how
23+
you launch multiple workers.
2224

2325
`````{tab-set}
2426
````{tab-item} CLI
@@ -45,8 +47,8 @@ n_workers = "auto"
4547
````
4648
`````
4749

48-
To use a different backend, pass the `--parallel-backend` option. The following command
49-
will execute the workflow with one worker and the loky backend.
50+
To specify the parallel backend, pass the `--parallel-backend` option. The following
51+
command will execute the workflow with one worker and the loky backend.
5052

5153
`````{tab-set}
5254
````{tab-item} CLI
@@ -72,23 +74,32 @@ parallel_backend = "loky"
7274
It is not possible to combine parallelization with debugging. That is why `--pdb` or
7375
`--trace` deactivate parallelization.
7476
75-
If you parallelize the execution of your tasks using two or more workers, do not use
76-
`breakpoint()` or `import pdb; pdb.set_trace()` since both will cause exceptions.
77+
If you parallelize the execution of your tasks, do not use `breakpoint()` or
78+
`import pdb; pdb.set_trace()` since both will cause exceptions.
7779
```
7880

7981
### loky
8082

8183
There are multiple backends available. The default is the backend provided by loky which
82-
aims to be a more robust implementation of {class}`~multiprocessing.pool.Pool` and in
84+
is a more robust implementation of {class}`~multiprocessing.pool.Pool` and in
8385
{class}`~concurrent.futures.ProcessPoolExecutor`.
8486

8587
```console
8688
pytask --parallel-backend loky
8789
```
8890

89-
As it spawns workers in new processes to run the tasks, it is especially suited for
90-
CPU-bound tasks. ([Here](https://stackoverflow.com/a/868577/7523785) is an
91-
explanation of what CPU- or IO-bound means.)
91+
A parallel backend with processes is especially suited for CPU-bound tasks as it spawns
92+
workers in new processes to run the tasks.
93+
([Here](https://stackoverflow.com/a/868577/7523785) is an explanation of what CPU- or
94+
IO-bound means.)
95+
96+
### coiled
97+
98+
pytask-parallel integrates with coiled allowing to run tasks in virtual machines of AWS,
99+
Azure and GCP. You can decide whether to run only some selected tasks or the whole
100+
project in the cloud.
101+
102+
Read more about coiled in [this guide](coiled.md).
92103

93104
### `concurrent.futures`
94105

@@ -141,11 +152,10 @@ Capturing warnings is not thread-safe. Therefore, warnings cannot be captured re
141152
when tasks are parallelized with `--parallel-backend threads`.
142153
```
143154

144-
### dask + coiled
155+
### dask
145156

146-
dask and coiled together provide the option to execute your workflow on cloud providers
147-
like AWS, GCP or Azure. Check out the [dedicated guide](dask.md) if you are interested
148-
in that.
157+
dask allows to run your workflows on many different kinds of clusters like cloud
158+
clusters and traditional HPC.
149159

150160
Using the default mode, dask will spawn multiple local workers to process the tasks.
151161

docs_src/coiled/coiled_functions.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import coiled
2+
3+
4+
@coiled.function()
5+
def task_example() -> None:
6+
pass
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
import coiled
2+
from pytask import task
3+
4+
5+
@task
6+
@coiled.function(
7+
region="eu-central-1", # Run the task close to you.
8+
memory="512 GB", # Use a lot of memory.
9+
cpu=128, # Use a lot of CPU.
10+
vm_type="p3.2xlarge", # Run a GPU instance.
11+
)
12+
def task_example() -> None: ...

docs_src/custom_executors.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
from concurrent.futures import Executor
2+
3+
from my_project.executor import CustomExecutor
4+
from pytask_parallel import ParallelBackend
5+
from pytask_parallel import WorkerType
6+
from pytask_parallel import registry
7+
8+
9+
def build_custom_executor(n_workers: int) -> Executor:
10+
return CustomExecutor(
11+
max_workers=n_workers, worker_type=WorkerType.PROCESSES, remote=False
12+
)
13+
14+
15+
registry.register_parallel_backend(ParallelBackend.CUSTOM, build_custom_executor)

src/pytask_parallel/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from __future__ import annotations
44

55
from pytask_parallel.backends import ParallelBackend
6+
from pytask_parallel.backends import WorkerType
67
from pytask_parallel.backends import registry
78

89
try:
@@ -13,4 +14,4 @@
1314
__version__ = "unknown"
1415

1516

16-
__all__ = ["ParallelBackend", "__version__", "registry"]
17+
__all__ = ["ParallelBackend", "__version__", "registry", "WorkerType"]

src/pytask_parallel/config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,5 +47,5 @@ def pytask_post_parse(config: dict[str, Any]) -> None:
4747
return
4848

4949
# Register parallel execute and logging hook.
50-
config["pm"].register(logging)
5150
config["pm"].register(execute)
51+
config["pm"].register(logging)

0 commit comments

Comments
 (0)