Skip to content

Commit 3530a07

Browse files
authored
Merge 6fe3ec4 into 58d940e
2 parents 58d940e + 6fe3ec4 commit 3530a07

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1091
-35
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ _generated
1515
.eggs
1616

1717
.pytask.sqlite3
18+
.pytask
1819

1920
build
2021
dist

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ repos:
115115
docs/source/tutorials/repeating_tasks_with_different_inputs.md|
116116
docs/source/tutorials/selecting_tasks.md|
117117
docs/source/tutorials/set_up_a_project.md|
118+
docs/source/tutorials/using_a_data_catalog.md|
118119
docs/source/tutorials/write_a_task.md
119120
)$
120121
- repo: https://github.com/nbQA-dev/nbQA
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<div class="termy">
2+
3+
```console
4+
5+
$ pytask
6+
──────────────────────────── Start pytask session ────────────────────────────
7+
Platform: win32 -- Python <span style="color: var(--termynal-blue)">3.10.0</span>, pytask <span style="color: var(--termynal-blue)">0.4.0</span>, pluggy <span style="color: var(--termynal-blue)">1.0.0</span>
8+
Root: C:\Users\pytask-dev\git\my_project
9+
Collected <span style="color: var(--termynal-blue)">2</span> task.
10+
11+
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
12+
┃ Task ┃ Outcome ┃
13+
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
14+
│ <span class="termynal-dim">task_data_preparation.py::</span>task_create_random_data │ <span class="termynal-success">.</span> │
15+
│ <span class="termynal-dim">task_plot_data.py::</span>task_plot_data │ <span class="termynal-success">.</span> │
16+
└───────────────────────────────────────────────────┴─────────┘
17+
18+
<span class="termynal-dim">──────────────────────────────────────────────────────────────────────────────</span>
19+
<span class="termynal-success">╭───────────</span> <span style="font-weight: bold;">Summary</span> <span class="termynal-success">────────────╮</span>
20+
<span class="termynal-success">│</span> <span style="font-weight: bold;"> 2 Collected tasks </span> <span class="termynal-success">│</span>
21+
<span class="termynal-success">│</span> <span class="termynal-success-textonly"> 2 Succeeded (100.0%) </span> <span class="termynal-success">│</span>
22+
<span class="termynal-success">╰────────────────────────────────╯</span>
23+
<span class="termynal-success">───────────────────────── Succeeded in 0.06 seconds ──────────────────────────</span>
24+
```
25+
26+
</div>

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@
8282

8383
intersphinx_mapping = {
8484
"click": ("https://click.palletsprojects.com/en/8.0.x/", None),
85+
"deepdiff": ("https://zepworks.com/deepdiff/current/", None),
8586
"networkx": ("https://networkx.org/documentation/stable", None),
8687
"pandas": ("https://pandas.pydata.org/docs", None),
8788
"pluggy": ("https://pluggy.readthedocs.io/en/latest", None),

docs/source/how_to_guides/hashing_inputs_of_tasks.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,10 +62,10 @@ from interpreter session to interpreter session for security reasons (see
6262
```
6363

6464
{class}`list` and {class}`dict` are not hashable by default. Luckily, there are
65-
libraries who provide this functionality like `deepdiff`. We can use them to pass a
65+
libraries who provide this functionality like {mod}`deepdiff`. We can use them to pass a
6666
function to the {class}`~pytask.PythonNode` that generates a stable hash.
6767

68-
First, install `deepdiff`.
68+
First, install {mod}`deepdiff`.
6969

7070
```console
7171
$ pip install deepdiff

docs/source/how_to_guides/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ hashing_inputs_of_tasks
1919
using_task_returns
2020
writing_custom_nodes
2121
how_to_write_a_plugin
22+
the_data_catalog
2223
```
2324

2425
## Best Practice Guides
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# The `DataCatalog` - Revisited
2+
3+
An introduction to the data catalog can be found in the
4+
[tutorial](../tutorials/using_a_data_catalog.md).
5+
6+
This guide explains some details that were left out of the tutorial.
7+
8+
## Changing the default node
9+
10+
The data catalog uses the {class}`~pytask.PickleNode` by default to serialize any kind
11+
of Python object. You can use any other node that follows the {protocol}`~pytask.PNode`
12+
protocol and register it when creating the data catalog.
13+
14+
For example, use the {class}`~pytask.PythonNode` as the default.
15+
16+
```python
17+
from pytask import PythonNode
18+
19+
20+
data_catalog = DataCatalog(default_node=PythonNode)
21+
```
22+
23+
Or, learn to write your own node by reading {doc}`writing_custom_nodes`.
24+
25+
Here, is an example for a `PickleNode` that uses cloudpickle instead of the normal
26+
`pickle` module.
27+
28+
```{literalinclude} ../../../docs_src/how_to_guides/the_data_catalog.py
29+
```
30+
31+
## Changing the name and the default path
32+
33+
By default, the data catalogs store their data in a directory `.pytask/data_catalogs`.
34+
If you use a `pyproject.toml` with a `[tool.pytask.ini_options]` section, then the
35+
`.pytask` folder is in the same folder as the configuration file.
36+
37+
The default name for a catalog is `"default"` and so you will find its data in
38+
`.pytask/data_catalogs/default`. If you assign a different name like
39+
`"data_management"`, you will find the data in `.pytask/data_catalogs/data_management`.
40+
41+
```python
42+
data_catalog = DataCatalog(name="data_management")
43+
```
44+
45+
You can also change the path where the data catalogs will be stored by changing the
46+
`path` attribute. Here, we store the data catalog's data next to the module where the
47+
data catalog is defined in `.data`.
48+
49+
```python
50+
from pathlib import Path
51+
52+
53+
data_catalog = DataCatalog(path=Path(__file__).parent / ".data")
54+
```
55+
56+
## Multiple data catalogs
57+
58+
You can use multiple data catalogs when you want to separate your datasets across
59+
multiple catalogs or when you want to use the same names multiple times (although it is
60+
not recommended!).
61+
62+
Make sure you assign different names to the data catalogs so that their data is stored
63+
in different directories.
64+
65+
```python
66+
# Stored in .pytask/data_catalog/a
67+
data_catalog_a = DataCatalog(name="a")
68+
69+
# Stored in .pytask/data_catalog/b
70+
data_catalog_b = DataCatalog(name="b")
71+
```
72+
73+
Or, use different paths as explained above.

docs/source/reference_guides/api.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ To write to the terminal, use pytask's console.
3333
pytask uses marks to attach additional information to task functions which is processed
3434
by the host or by plugins. The following marks are available by default.
3535

36-
### Marks
36+
### Built-in marks
3737

3838
```{eval-rst}
3939
.. function:: pytask.mark.depends_on(objects: Any | Iterable[Any] | dict[Any, Any])
@@ -236,7 +236,8 @@ The remaining exceptions convey specific errors.
236236

237237
```{eval-rst}
238238
.. autoclass:: pytask.Session
239-
239+
.. autoclass:: pytask.DataCatalog
240+
:members:
240241
```
241242

242243
## Protocols
@@ -262,7 +263,11 @@ Nodes are the interface for different kinds of dependencies or products.
262263

263264
```{eval-rst}
264265
.. autoclass:: pytask.PathNode
266+
:members: load, save
267+
.. autoclass:: pytask.PickleNode
268+
:members: load, save
265269
.. autoclass:: pytask.PythonNode
270+
:members: load, save
266271
```
267272

268273
To parse dependencies and products from nodes, use the following functions.

docs/source/tutorials/defining_dependencies_products.md

Lines changed: 42 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,47 @@
33
To ensure pytask executes all tasks in the correct order, you need to define
44
dependencies and products for each task.
55

6-
This tutorial offers you different interfaces. One important difference between them is
7-
that if you are comfortable with type annotations or not afraid to try them, take a look
8-
at the tabs named `Python 3.10+` or `Python 3.8+`.
6+
This tutorial offers you different interfaces. If you are comfortable with type
7+
annotations or not afraid to try them, take a look at the tabs named `Python 3.10+` or
8+
`Python 3.8+`.
99

1010
If you want to avoid type annotations for now, look at the tab named `produces`.
1111

12+
The deprecated approaches can be found in the tabs named `Decorators`.
13+
1214
```{seealso}
1315
An overview on the different interfaces and their strength and weaknesses is given in
1416
{doc}`../explanations/interfaces_for_dependencies_products`.
1517
```
1618

17-
Let's first focus on how to define products which should already be familiar to you.
19+
First, we focus on how to define products which should already be familiar to you. Then,
20+
we focus on how task dependencies can be declared.
21+
22+
We use the same project layout as before and add a `task_plot_data.py` module.
23+
24+
```text
25+
my_project
26+
├───pyproject.toml
27+
28+
├───src
29+
│ └───my_project
30+
│ ├────config.py
31+
│ ├────task_data_preparation.py
32+
│ └────task_plot_data.py
33+
34+
├───setup.py
35+
36+
├───.pytask.sqlite3
37+
38+
└───bld
39+
├────data.pkl
40+
└────plot.png
41+
```
1842

1943
## Products
2044

21-
Let's revisit the task from the {doc}`previous tutorial <write_a_task>`.
45+
Let's revisit the task from the {doc}`previous tutorial <write_a_task>` that we defined
46+
in `task_data_preparation.py`.
2247

2348
::::{tab-set}
2449

@@ -90,7 +115,9 @@ beneficial for handling paths conveniently and across platforms.
90115
Most tasks have dependencies and it is important to specify. Then, pytask ensures that
91116
the dependencies are available before executing the task.
92117

93-
In the example you see a task that creates a plot while relying on some data set.
118+
As an example, we want to extend our project with another task that plots the data that
119+
we generated with `task_create_random_data`. The task is called `task_plot_data` and we
120+
will define it in `task_plot_data.py`.
94121

95122
::::{tab-set}
96123

@@ -104,7 +131,7 @@ pytask assumes that all function arguments that do not have the {class}`~pytask.
104131
annotation are dependencies of the task.
105132

106133
```{literalinclude} ../../../docs_src/tutorials/defining_dependencies_products_dependencies_py310.py
107-
:emphasize-lines: 9
134+
:emphasize-lines: 11
108135
```
109136

110137
:::
@@ -119,7 +146,7 @@ pytask assumes that all function arguments that do not have the {class}`~pytask.
119146
annotation are dependencies of the task.
120147

121148
```{literalinclude} ../../../docs_src/tutorials/defining_dependencies_products_dependencies_py38.py
122-
:emphasize-lines: 9
149+
:emphasize-lines: 11
123150
```
124151

125152
:::
@@ -134,7 +161,7 @@ pytask assumes that all function arguments that are not passed to the argument
134161
`produces` are dependencies of the task.
135162

136163
```{literalinclude} ../../../docs_src/tutorials/defining_dependencies_products_dependencies_produces.py
137-
:emphasize-lines: 7
164+
:emphasize-lines: 9
138165
```
139166

140167
:::
@@ -152,12 +179,17 @@ Equivalent to products, you can use the
152179
access the dependency path inside the function and load the data.
153180

154181
```{literalinclude} ../../../docs_src/tutorials/defining_dependencies_products_dependencies_decorators.py
155-
:emphasize-lines: 7, 9
182+
:emphasize-lines: 9, 11
156183
```
157184

158185
:::
159186
::::
160187

188+
Now, let us execute the two paths.
189+
190+
```{include} ../_static/md/defining-dependencies-products.md
191+
```
192+
161193
## Relative paths
162194

163195
Dependencies and products do not have to be absolute paths. If paths are relative, they

docs/source/tutorials/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ installation
1111
set_up_a_project
1212
write_a_task
1313
defining_dependencies_products
14+
using_a_data_catalog
1415
invoking_pytask
1516
configuration
1617
plugins

0 commit comments

Comments
 (0)