Skip to content

Commit 22de753

Browse files
authored
Merge branch 'main' into pre-commit-ci-update-config
2 parents 79afc6a + 9e996a5 commit 22de753

File tree

73 files changed

+1866
-143
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+1866
-143
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ _generated
1414
*.egg-info
1515
.eggs
1616

17-
.pytask.sqlite3
17+
.pytask
1818

1919
build
2020
dist

.pre-commit-config.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ repos:
5151
rev: v0.1.3
5252
hooks:
5353
- id: ruff
54+
args: [--unsafe-fixes]
5455
- repo: https://github.com/dosisod/refurb
5556
rev: v1.22.1
5657
hooks:
@@ -114,6 +115,7 @@ repos:
114115
docs/source/tutorials/repeating_tasks_with_different_inputs.md|
115116
docs/source/tutorials/selecting_tasks.md|
116117
docs/source/tutorials/set_up_a_project.md|
118+
docs/source/tutorials/using_a_data_catalog.md|
117119
docs/source/tutorials/write_a_task.md
118120
)$
119121
- repo: https://github.com/nbQA-dev/nbQA
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<div class="termy">
2+
3+
```console
4+
5+
$ pytask
6+
──────────────────────────── Start pytask session ────────────────────────────
7+
Platform: win32 -- Python <span style="color: var(--termynal-blue)">3.10.0</span>, pytask <span style="color: var(--termynal-blue)">0.4.0</span>, pluggy <span style="color: var(--termynal-blue)">1.0.0</span>
8+
Root: C:\Users\pytask-dev\git\my_project
9+
Collected <span style="color: var(--termynal-blue)">2</span> task.
10+
11+
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
12+
┃ Task ┃ Outcome ┃
13+
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
14+
│ <span class="termynal-dim">task_data_preparation.py::</span>task_create_random_data │ <span class="termynal-success">.</span> │
15+
│ <span class="termynal-dim">task_plot_data.py::</span>task_plot_data │ <span class="termynal-success">.</span> │
16+
└───────────────────────────────────────────────────┴─────────┘
17+
18+
<span class="termynal-dim">──────────────────────────────────────────────────────────────────────────────</span>
19+
<span class="termynal-success">╭───────────</span> <span style="font-weight: bold;">Summary</span> <span class="termynal-success">────────────╮</span>
20+
<span class="termynal-success">│</span> <span style="font-weight: bold;"> 2 Collected tasks </span> <span class="termynal-success">│</span>
21+
<span class="termynal-success">│</span> <span class="termynal-success-textonly"> 2 Succeeded (100.0%) </span> <span class="termynal-success">│</span>
22+
<span class="termynal-success">╰────────────────────────────────╯</span>
23+
<span class="termynal-success">───────────────────────── Succeeded in 0.06 seconds ──────────────────────────</span>
24+
```
25+
26+
</div>

docs/source/changes.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ releases are available on [PyPI](https://pypi.org/project/pytask) and
1919
- {pull}`463` raise error when a task function is not defined inside the loop body.
2020
- {pull}`464` improves pinned dependencies.
2121
- {pull}`465` adds test to ensure internal tracebacks are removed by reports.
22+
- {pull}`466` implements hashing for files instead of modification timestamps.
23+
- {pull}`470` moves `.pytask.sqlite3` to `.pytask`.
24+
- {pull}`472` adds `is_product` to {meth}`PNode.load`.
2225

2326
## 0.4.1 - 2023-10-11
2427

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@
8282

8383
intersphinx_mapping = {
8484
"click": ("https://click.palletsprojects.com/en/8.0.x/", None),
85+
"deepdiff": ("https://zepworks.com/deepdiff/current/", None),
8586
"networkx": ("https://networkx.org/documentation/stable", None),
8687
"pandas": ("https://pandas.pydata.org/docs", None),
8788
"pluggy": ("https://pluggy.readthedocs.io/en/latest", None),

docs/source/how_to_guides/bp_scaling_tasks.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ my_project
4242
4343
├───setup.py
4444
45-
├───.pytask.sqlite3
45+
├───.pytask
46+
│ └────...
4647
4748
└───bld
4849
```

docs/source/how_to_guides/functional_interface.ipynb

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

docs/source/how_to_guides/hashing_inputs_of_tasks.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,10 +62,10 @@ from interpreter session to interpreter session for security reasons (see
6262
```
6363

6464
{class}`list` and {class}`dict` are not hashable by default. Luckily, there are
65-
libraries who provide this functionality like `deepdiff`. We can use them to pass a
65+
libraries who provide this functionality like {mod}`deepdiff`. We can use them to pass a
6666
function to the {class}`~pytask.PythonNode` that generates a stable hash.
6767

68-
First, install `deepdiff`.
68+
First, install {mod}`deepdiff`.
6969

7070
```console
7171
$ pip install deepdiff

docs/source/how_to_guides/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ hashing_inputs_of_tasks
1919
using_task_returns
2020
writing_custom_nodes
2121
how_to_write_a_plugin
22+
the_data_catalog
2223
```
2324

2425
## Best Practice Guides
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# The `DataCatalog` - Revisited
2+
3+
An introduction to the data catalog can be found in the
4+
[tutorial](../tutorials/using_a_data_catalog.md).
5+
6+
This guide explains some details that were left out of the tutorial.
7+
8+
## Changing the default node
9+
10+
The data catalog uses the {class}`~pytask.PickleNode` by default to serialize any kind
11+
of Python object. You can use any other node that follows the {protocol}`~pytask.PNode`
12+
protocol and register it when creating the data catalog.
13+
14+
For example, use the {class}`~pytask.PythonNode` as the default.
15+
16+
```python
17+
from pytask import PythonNode
18+
19+
20+
data_catalog = DataCatalog(default_node=PythonNode)
21+
```
22+
23+
Or, learn to write your own node by reading {doc}`writing_custom_nodes`.
24+
25+
Here, is an example for a `PickleNode` that uses cloudpickle instead of the normal
26+
`pickle` module.
27+
28+
```{literalinclude} ../../../docs_src/how_to_guides/the_data_catalog.py
29+
```
30+
31+
## Changing the name and the default path
32+
33+
By default, the data catalogs store their data in a directory `.pytask/data_catalogs`.
34+
If you use a `pyproject.toml` with a `[tool.pytask.ini_options]` section, then the
35+
`.pytask` folder is in the same folder as the configuration file.
36+
37+
The default name for a catalog is `"default"` and so you will find its data in
38+
`.pytask/data_catalogs/default`. If you assign a different name like
39+
`"data_management"`, you will find the data in `.pytask/data_catalogs/data_management`.
40+
41+
```python
42+
data_catalog = DataCatalog(name="data_management")
43+
```
44+
45+
You can also change the path where the data catalogs will be stored by changing the
46+
`path` attribute. Here, we store the data catalog's data next to the module where the
47+
data catalog is defined in `.data`.
48+
49+
```python
50+
from pathlib import Path
51+
52+
53+
data_catalog = DataCatalog(path=Path(__file__).parent / ".data")
54+
```
55+
56+
## Multiple data catalogs
57+
58+
You can use multiple data catalogs when you want to separate your datasets across
59+
multiple catalogs or when you want to use the same names multiple times (although it is
60+
not recommended!).
61+
62+
Make sure you assign different names to the data catalogs so that their data is stored
63+
in different directories.
64+
65+
```python
66+
# Stored in .pytask/data_catalog/a
67+
data_catalog_a = DataCatalog(name="a")
68+
69+
# Stored in .pytask/data_catalog/b
70+
data_catalog_b = DataCatalog(name="b")
71+
```
72+
73+
Or, use different paths as explained above.

docs/source/how_to_guides/writing_custom_nodes.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,6 @@ to inputs and outputs and call {func}`pandas.read_pickle` and
2020
To remove IO operations from the task and delegate them to pytask, we will write a
2121
`PickleNode` that automatically loads and stores Python objects.
2222

23-
We will also use the feature explained in {doc}`using_task_returns` to define products
24-
of the task function via the function's return value.
25-
2623
And we pass the value to `df` via {obj}`Annotated` to preserve the type hint.
2724

2825
The result will be the following task.
@@ -37,12 +34,28 @@ The result will be the following task.
3734

3835
:::
3936

37+
:::{tab-item} Python 3.10+ & Return
38+
:sync: python310plus
39+
40+
```{literalinclude} ../../../docs_src/how_to_guides/writing_custom_nodes_example_2_py310_return.py
41+
```
42+
43+
:::
44+
4045
:::{tab-item} Python 3.8+
4146
:sync: python38plus
4247

4348
```{literalinclude} ../../../docs_src/how_to_guides/writing_custom_nodes_example_2_py38.py
4449
```
4550

51+
:::
52+
53+
:::{tab-item} Python 3.8+ & Return
54+
:sync: python38plus
55+
56+
```{literalinclude} ../../../docs_src/how_to_guides/writing_custom_nodes_example_2_py38_return.py
57+
```
58+
4659
:::
4760
::::
4861

@@ -97,7 +110,12 @@ Here are some explanations.
97110
the value changes, pytask knows it needs to regenerate the workflow. We can use
98111
the timestamp of when the node was last modified.
99112
- pytask calls {meth}`PickleNode.load` when it collects the values of function arguments
100-
to run the function. In our example, we read the file and unpickle the data.
113+
to run the function. The argument `is_product` signals that the node is loaded as a
114+
product with a {class}`~pytask.Product` annotation or via `produces`.
115+
116+
When the node is loaded as a dependency, we want to inject the value of the pickle
117+
file. In the other case, the node returns itself so users can call
118+
{meth}`PickleNode.save` themselves.
101119
- {meth}`PickleNode.save` is called when a task function returns and allows to save the
102120
return values.
103121

docs/source/reference_guides/api.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ To write to the terminal, use pytask's console.
3333
pytask uses marks to attach additional information to task functions which is processed
3434
by the host or by plugins. The following marks are available by default.
3535

36-
### Marks
36+
### Built-in marks
3737

3838
```{eval-rst}
3939
.. function:: pytask.mark.depends_on(objects: Any | Iterable[Any] | dict[Any, Any])
@@ -236,7 +236,8 @@ The remaining exceptions convey specific errors.
236236

237237
```{eval-rst}
238238
.. autoclass:: pytask.Session
239-
239+
.. autoclass:: pytask.DataCatalog
240+
:members:
240241
```
241242

242243
## Protocols
@@ -262,7 +263,11 @@ Nodes are the interface for different kinds of dependencies or products.
262263

263264
```{eval-rst}
264265
.. autoclass:: pytask.PathNode
266+
:members: load, save
267+
.. autoclass:: pytask.PickleNode
268+
:members: load, save
265269
.. autoclass:: pytask.PythonNode
270+
:members: load, save
266271
```
267272

268273
To parse dependencies and products from nodes, use the following functions.
@@ -338,6 +343,13 @@ outcome.
338343
.. autofunction:: pytask.count_outcomes
339344
```
340345

346+
## Path utilities
347+
348+
```{eval-rst}
349+
.. autofunction:: pytask.path.import_path
350+
.. autofunction:: pytask.path.hash_path
351+
```
352+
341353
## Programmatic Interfaces
342354

343355
```{eval-rst}
@@ -355,6 +367,17 @@ There are some classes to handle different kinds of reports.
355367
.. autoclass:: pytask.DagReport
356368
```
357369

370+
## Tree utilities
371+
372+
```{eval-rst}
373+
.. autofunction:: pytask.tree_util.PyTree
374+
.. autofunction:: pytask.tree_util.tree_flatten_with_path
375+
.. autofunction:: pytask.tree_util.tree_leaves
376+
.. autofunction:: pytask.tree_util.tree_map
377+
.. autofunction:: pytask.tree_util.tree_map_with_path
378+
.. autofunction:: pytask.tree_util.tree_structure
379+
```
380+
358381
## Typing
359382

360383
```{eval-rst}

docs/source/reference_guides/configuration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,12 +46,12 @@ are welcome to also support macOS.
4646
4747
pytask uses a database to keep track of tasks, products, and dependencies over runs. By
4848
default, it will create an SQLite database in the project's root directory called
49-
`.pytask.sqlite3`. If you want to use a different name or a different dialect
49+
`.pytask/pytask.sqlite3`. If you want to use a different name or a different dialect
5050
[supported by sqlalchemy](https://docs.sqlalchemy.org/en/latest/core/engines.html#backend-specific-urls),
5151
use either {option}`pytask build --database-url` or `database_url` in the config.
5252
5353
```toml
54-
database_url = "sqlite:///.pytask.sqlite3"
54+
database_url = "sqlite:///.pytask/pytask.sqlite3"
5555
```
5656
5757
Relative paths for SQLite databases are interpreted as either relative to the

docs/source/tutorials/configuration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ pytask can be configured via the command-line interface or permanently with a
44
`pyproject.toml` file.
55

66
The file also indicates the root of your project where pytask stores information in a
7-
`.pytask.sqlite3` database.
7+
`.pytask` folder.
88

99
## The configuration file
1010

0 commit comments

Comments
 (0)