Skip to content

Commit aedb942

Browse files
authored
Sync local paths in remote environments with the same OS. (#101)
1 parent e84554b commit aedb942

18 files changed

+324
-146
lines changed

.github/workflows/main.yml

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -18,20 +18,20 @@ on:
1818

1919
jobs:
2020

21-
build-package:
22-
name: Build & verify package
21+
run-type-checking:
22+
23+
name: Run tests for type-checking
2324
runs-on: ubuntu-latest
2425

2526
steps:
2627
- uses: actions/checkout@v4
28+
- uses: actions/setup-python@v5
2729
with:
28-
fetch-depth: 0
29-
30-
- uses: hynek/build-and-inspect-python-package@v2
31-
id: baipp
32-
33-
outputs:
34-
python-versions: ${{ steps.baipp.outputs.supported_python_classifiers_json_array }}
30+
python-version-file: .python-version
31+
allow-prereleases: true
32+
cache: pip
33+
- run: pip install tox-uv
34+
- run: tox -e typing
3535

3636
run-tests:
3737

@@ -59,16 +59,16 @@ jobs:
5959
shell: bash -l {0}
6060
run: tox -e test -- tests -m "unit or (not integration and not end_to_end)" --cov=src --cov=tests --cov-report=xml
6161

62-
- name: Upload coverage report for unit tests and doctests.
63-
if: runner.os == 'Linux' && matrix.python-version == '3.10'
64-
shell: bash -l {0}
65-
run: bash <(curl -s https://codecov.io/bash) -F unit -c
62+
- name: Upload unit test coverage reports to Codecov with GitHub Action
63+
uses: codecov/codecov-action@v4
64+
with:
65+
flags: unit
6666

6767
- name: Run end-to-end tests.
6868
shell: bash -l {0}
6969
run: tox -e test -- tests -m end_to_end --cov=src --cov=tests --cov-report=xml
7070

71-
- name: Upload coverage reports of end-to-end tests.
72-
if: runner.os == 'Linux' && matrix.python-version == '3.10'
73-
shell: bash -l {0}
74-
run: bash <(curl -s https://codecov.io/bash) -F end_to_end -c
71+
- name: Upload end_to_end test coverage reports to Codecov with GitHub Action
72+
uses: codecov/codecov-action@v4
73+
with:
74+
flags: end_to_end

.pre-commit-config.yaml

Lines changed: 12 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ repos:
1414
- id: no-commit-to-branch
1515
args: [--branch, main]
1616
- repo: https://github.com/pre-commit/pygrep-hooks
17-
rev: v1.10.0 # Use the ref you want to point at
17+
rev: v1.10.0
1818
hooks:
1919
- id: python-check-blanket-noqa
2020
- id: python-check-mock-methods
@@ -44,39 +44,22 @@ repos:
4444
mdformat-black,
4545
mdformat-pyproject,
4646
]
47+
args: [--wrap, "88"]
4748
files: (docs/.)
48-
# Conflicts with admonitions.
49-
# - repo: https://github.com/executablebooks/mdformat
50-
# rev: 0.7.17
51-
# hooks:
52-
# - id: mdformat
53-
# additional_dependencies: [
54-
# mdformat-gfm,
55-
# mdformat-black,
56-
# ]
57-
# args: [--wrap, "88"]
49+
- repo: https://github.com/executablebooks/mdformat
50+
rev: 0.7.17
51+
hooks:
52+
- id: mdformat
53+
additional_dependencies: [
54+
mdformat-gfm,
55+
mdformat-black,
56+
]
57+
args: [--wrap, "88"]
58+
files: (README\.md)
5859
- repo: https://github.com/codespell-project/codespell
5960
rev: v2.2.6
6061
hooks:
6162
- id: codespell
62-
- repo: https://github.com/pre-commit/mirrors-mypy
63-
rev: 'v1.10.0'
64-
hooks:
65-
- id: mypy
66-
args: [
67-
--no-strict-optional,
68-
--ignore-missing-imports,
69-
]
70-
additional_dependencies: [
71-
attrs,
72-
cloudpickle,
73-
loky,
74-
"git+https://github.com/pytask-dev/pytask.git@main",
75-
rich,
76-
types-click,
77-
types-setuptools,
78-
]
79-
pass_filenames: false
8063
- repo: meta
8164
hooks:
8265
- id: check-hooks-apply

docs/source/changes.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ releases are available on [PyPI](https://pypi.org/project/pytask-parallel) and
2323
interactions with adaptive scaling. {issue}`98` does handle the resulting issues: no
2424
strong adherence to priorities, no pending status.
2525
- {pull}`100` adds project management with rye.
26+
- {pull}`101` adds syncing for local paths as dependencies or products in remote
27+
environments with the same OS.
2628

2729
## 0.4.1 - 2024-01-12
2830

docs/source/coiled.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,14 @@ configure the hardware and software environment.
6666
```{literalinclude} ../../docs_src/coiled/coiled_functions_task.py
6767
```
6868

69+
By default, {func}`@coiled.function <coiled.function>`
70+
[scales adaptively](https://docs.coiled.io/user_guide/usage/functions/index.html#adaptive-scaling)
71+
to the workload. It means that coiled infers from the number of submitted tasks and
72+
previous runtimes, how many additional remote workers it should deploy to handle the
73+
workload. It provides a convenient mechanism to scale without intervention. Also,
74+
workers launched by {func}`@coiled.function <coiled.function>` will shutdown quicker
75+
than a cluster.
76+
6977
```{seealso}
7078
Serverless functions are more thoroughly explained in
7179
[coiled's guide](https://docs.coiled.io/user_guide/usage/functions/index.html).

docs/source/dask.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -83,15 +83,15 @@ You can find more information in the documentation for
8383

8484
## Remote
8585

86-
You can learn how to deploy your tasks to a remote dask cluster in [this
87-
guide](https://docs.dask.org/en/stable/deploying.html). They recommend to use coiled for
88-
deployment to cloud providers.
86+
You can learn how to deploy your tasks to a remote dask cluster in
87+
[this guide](https://docs.dask.org/en/stable/deploying.html). They recommend to use
88+
coiled for deployment to cloud providers.
8989

9090
[coiled](https://www.coiled.io/) is a product built on top of dask that eases the
9191
deployment of your workflow to many cloud providers like AWS, GCP, and Azure.
9292

9393
If you want to run the tasks in your project on a cluster managed by coiled read
9494
{ref}`this guide <coiled-clusters>`.
9595

96-
Otherwise, follow the instructions in [dask's
97-
guide](https://docs.dask.org/en/stable/deploying.html).
96+
Otherwise, follow the instructions in
97+
[dask's guide](https://docs.dask.org/en/stable/deploying.html).

docs/source/developers_guide.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
# Developer's Guide
22

33
`pytask-parallel` does not call the `pytask_execute_task_protocol` hook
4-
specification/entry-point because `pytask_execute_task_setup` and
5-
`pytask_execute_task` need to be separated from `pytask_execute_task_teardown`. Thus,
6-
plugins that change this hook specification may not interact well with the
7-
parallelization.
4+
specification/entry-point because `pytask_execute_task_setup` and `pytask_execute_task`
5+
need to be separated from `pytask_execute_task_teardown`. Thus, plugins that change this
6+
hook specification may not interact well with the parallelization.
87

98
Two PRs for CPython try to re-enable setting custom reducers which should have been
109
working but does not. Here are the references.

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ quickstart
2626
coiled
2727
dask
2828
custom_executors
29+
remote_backends
2930
developers_guide
3031
changes
3132
On Github <https://github.com/pytask-dev/pytask-parallel>

docs/source/remote_backends.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Remote backends
2+
3+
There are a couple of things you need to know when using backends that launch workers
4+
remotely, meaning not on your machine.
5+
6+
## Cross-platform support
7+
8+
Issue: {issue}`102`.
9+
10+
Currently, it is not possible to run tasks in a remote environment that has a different
11+
OS than your local system. The reason is that when pytask sends data to the remote
12+
worker, the data contains path objects, {class}`pathlib.WindowsPath` or
13+
{class}`pathlib.PosixPath`, which cannot be unpickled on a different system.
14+
15+
In general, remote machines are Unix machines which means people running Unix systems
16+
themselves like Linux and MacOS should have no problems.
17+
18+
Windows users on the other hand should use the
19+
[WSL (Windows Subsystem for Linux)](https://learn.microsoft.com/en-us/windows/wsl/about)
20+
to run their projects.
21+
22+
## Local files
23+
24+
Avoid using local files with remote backends and use storages like S3 for dependencies
25+
and products. The reason is that every local file needs to be send to the remote workers
26+
and when your internet connection is slow you will face a hefty penalty on runtime.
27+
28+
## Local paths
29+
30+
In most projects you are using local paths to refer to dependencies and products of your
31+
tasks. This becomes an interesting problem with remote workers since your local files
32+
are not necessarily available in the remote machine.
33+
34+
pytask-parallel does its best to sync files before the execution to the worker, so you
35+
can run your tasks locally and remotely without changing a thing.
36+
37+
In case you create a file on the remote machine, the product will be synced back to your
38+
local machine as well.
39+
40+
It is still necessary to know that the remote paths are temporary files that share the
41+
same file extension as the local file, but the name and path will be different. Do not
42+
rely on them.
43+
44+
Another way to circumvent the problem is to first define a local task that stores all
45+
your necessary files in a remote storage like S3. In the remaining tasks, you can then
46+
use paths pointing to the bucket instead of the local machine. See the
47+
[guide on remote files](https://tinyurl.com/pytask-remote) for more explanations.

pyproject.toml

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -55,21 +55,12 @@ content-type = "text/markdown"
5555
text = "MIT"
5656

5757
[project.urls]
58-
Homepage = "https://github.com/pytask-dev/pytask-parallel"
59-
Changelog = "https://github.com/pytask-dev/pytask-parallel/blob/main/CHANGES.md"
60-
Documentation = "https://github.com/pytask-dev/pytask-parallel"
58+
Homepage = "https://pytask-parallel.readthedocs.io/"
59+
Changelog = "https://pytask-parallel.readthedocs.io/en/latest/changes.html"
60+
Documentation = "https://pytask-parallel.readthedocs.io/"
6161
Github = "https://github.com/pytask-dev/pytask-parallel"
6262
Tracker = "https://github.com/pytask-dev/pytask-parallel/issues"
6363

64-
[tool.setuptools]
65-
include-package-data = true
66-
zip-safe = false
67-
platforms = ["any"]
68-
license-files = ["LICENSE"]
69-
70-
[tool.check-manifest]
71-
ignore = ["src/pytask_parallel/_version.py"]
72-
7364
[project.entry-points.pytask]
7465
pytask_parallel = "pytask_parallel.plugin"
7566

@@ -113,6 +104,7 @@ disallow_untyped_defs = true
113104
no_implicit_optional = true
114105
warn_redundant_casts = true
115106
warn_unused_ignores = true
107+
ignore_missing_imports = true
116108

117109
[[tool.mypy.overrides]]
118110
module = "tests.*"
@@ -127,6 +119,7 @@ unsafe-fixes = true
127119
[tool.ruff.lint]
128120
extend-ignore = [
129121
"ANN101", # type annotating self
122+
"ANN102", # type annotating cls
130123
"ANN401", # flake8-annotate typing.Any
131124
"COM812", # Comply with ruff-format.
132125
"ISC001", # Comply with ruff-format.

src/pytask_parallel/backends.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,11 @@ def submit( # type: ignore[override]
4646

4747
def _get_dask_executor(n_workers: int) -> Executor:
4848
"""Get an executor from a dask client."""
49-
_rich_traceback_omit = True
49+
_rich_traceback_guard = True
5050
from pytask import import_optional_dependency
5151

5252
distributed = import_optional_dependency("distributed")
53+
assert distributed # noqa: S101
5354
try:
5455
client = distributed.Client.current()
5556
except ValueError:

src/pytask_parallel/config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ def pytask_parse_config(config: dict[str, Any]) -> None:
2727
raise ValueError(msg) from None
2828

2929
if config["n_workers"] == "auto":
30-
config["n_workers"] = max(os.cpu_count() - 1, 1)
30+
config["n_workers"] = max(os.cpu_count() - 1, 1) # type: ignore[operator]
3131

3232
# If more than one worker is used, and no backend is set, use loky.
3333
if config["n_workers"] > 1 and config["parallel_backend"] == ParallelBackend.NONE:

src/pytask_parallel/execute.py

Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from typing import Any
99

1010
import cloudpickle
11+
from _pytask.node_protocols import PPathNode
1112
from attrs import define
1213
from attrs import field
1314
from pytask import ExecutionReport
@@ -24,9 +25,10 @@
2425

2526
from pytask_parallel.backends import WorkerType
2627
from pytask_parallel.backends import registry
28+
from pytask_parallel.typing import CarryOverPath
29+
from pytask_parallel.typing import is_coiled_function
2730
from pytask_parallel.utils import create_kwargs_for_task
2831
from pytask_parallel.utils import get_module
29-
from pytask_parallel.utils import is_coiled_function
3032
from pytask_parallel.utils import parse_future_result
3133

3234
if TYPE_CHECKING:
@@ -222,7 +224,7 @@ def pytask_execute_task(session: Session, task: PTask) -> Future[WrapperResult]:
222224
from pytask_parallel.wrappers import wrap_task_in_thread
223225

224226
return session.config["_parallel_executor"].submit(
225-
wrap_task_in_thread, task=task, **kwargs
227+
wrap_task_in_thread, task=task, remote=False, **kwargs
226228
)
227229
msg = f"Unknown worker type {worker_type}"
228230
raise ValueError(msg)
@@ -235,19 +237,33 @@ def pytask_unconfigure() -> None:
235237

236238

237239
def _update_carry_over_products(
238-
task: PTask, carry_over_products: PyTree[PythonNode | None] | None
240+
task: PTask, carry_over_products: PyTree[CarryOverPath | PythonNode | None] | None
239241
) -> None:
240-
"""Update products carry over from a another process or remote worker."""
242+
"""Update products carry over from a another process or remote worker.
241243
242-
def _update_carry_over_node(x: PNode, y: PythonNode | None) -> PNode:
243-
if y:
244+
The python node can be a regular one passing the value to another python node.
245+
246+
In other instances the python holds a string or bytes from a RemotePathNode.
247+
248+
"""
249+
250+
def _update_carry_over_node(
251+
x: PNode, y: CarryOverPath | PythonNode | None
252+
) -> PNode:
253+
if y is None:
254+
return x
255+
if isinstance(x, PPathNode) and isinstance(y, CarryOverPath):
256+
x.path.write_bytes(y.content)
257+
return x
258+
if isinstance(y, PythonNode):
244259
x.save(y.load())
245-
return x
260+
return x
261+
raise NotImplementedError
246262

247-
structure_python_nodes = tree_structure(carry_over_products)
263+
structure_carry_over_products = tree_structure(carry_over_products)
248264
structure_produces = tree_structure(task.produces)
249265
# strict must be false when none is leaf.
250-
if structure_produces.is_prefix(structure_python_nodes, strict=False):
266+
if structure_produces.is_prefix(structure_carry_over_products, strict=False):
251267
task.produces = tree_map(
252268
_update_carry_over_node, task.produces, carry_over_products
253269
) # type: ignore[assignment]

0 commit comments

Comments
 (0)