Skip to content

Use pybaum for more flexible dependencies and products. #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Feb 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
fd7f436
Add pybaum.
tobiasraabe Jan 30, 2022
fe6a89c
Move to conversion to dicts, disable tuples as name, value combinations.
tobiasraabe Jan 31, 2022
104e2dc
Use tree_map.
tobiasraabe Jan 31, 2022
c4383ad
Extend test to depends_on.
tobiasraabe Jan 31, 2022
a758e5a
Fix functio.
tobiasraabe Jan 31, 2022
29bc3ea
Fix docstrings and add examples.
tobiasraabe Jan 31, 2022
8e34a0a
Merge branch 'main' into make-deps-prods-flexible
tobiasraabe Jan 31, 2022
9f640d1
add a section to the docs.
tobiasraabe Feb 1, 2022
4538378
Merge branch 'make-deps-prods-flexible' of https://github.com/pytask-…
tobiasraabe Feb 1, 2022
9bb85fe
more tests.
tobiasraabe Feb 1, 2022
f449239
Add test csae.
tobiasraabe Feb 1, 2022
ea448f6
add more docs.
tobiasraabe Feb 1, 2022
86f2246
TEMP COMMIT USE SPECIAL KEY INSTEAD OF KEEP_DICT.
tobiasraabe Feb 1, 2022
8d1b3c2
Merge branch 'main' into make-deps-prods-flexible
tobiasraabe Feb 1, 2022
7559267
Revert "TEMP COMMIT USE SPECIAL KEY INSTEAD OF KEEP_DICT."
tobiasraabe Feb 2, 2022
abaddf7
Recategorize tests.
tobiasraabe Feb 2, 2022
7479d8e
Use pybaum throughout pytask.
tobiasraabe Feb 2, 2022
d7f1f9b
Test profile with complex deps.
tobiasraabe Feb 2, 2022
3abab34
Categorize tests.
tobiasraabe Feb 2, 2022
923dde5
Fix type issue.
tobiasraabe Feb 2, 2022
165bebf
Add some tests.
tobiasraabe Feb 5, 2022
c78a2be
Merge branch 'main' into make-deps-prods-flexible
tobiasraabe Feb 19, 2022
c585822
Use tree_just_yield.
tobiasraabe Feb 19, 2022
3c939a0
Merge branch 'main' into make-deps-prods-flexible
tobiasraabe Feb 21, 2022
3530641
Fix release notes.
tobiasraabe Feb 23, 2022
57a5368
Merge branch 'main' into make-deps-prods-flexible
tobiasraabe Feb 23, 2022
927a039
Merge branch 'main' into make-deps-prods-flexible
tobiasraabe Feb 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/rtd_environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ dependencies:
- networkx
- pluggy
- pony >=0.7.15
- pybaum
- pexpect
- rich
- typing-extensions
Expand Down
2 changes: 2 additions & 0 deletions docs/source/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ all releases are available on `PyPI <https://pypi.org/project/pytask>`_ and
0.2.0 - 2022-xx-xx
------------------

- :pull:`211` allows for flexible dependencies and products which can be any pytree of
native Python objects as supported by pybaum.
- :pull:`227` implements ``task.kwargs`` as a new way for a task to hold parametrized
arguments. It also implements :class:`_pytask.models.CollectionMetadata` to carry
parametrized arguments to the task class.
Expand Down
124 changes: 100 additions & 24 deletions docs/source/tutorials/how_to_define_dependencies_products.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,30 @@ Multiple dependencies and products
----------------------------------

Most tasks have multiple dependencies or products. The easiest way to attach multiple
dependencies or products to a task is to pass a :class:`dict`, :class:`list` or another
iterator to the marker containing the paths.
dependencies or products to a task is to pass a :class:`dict` (highly recommended),
:class:`list` or another iterator to the marker containing the paths.

To assign labels to dependencies or products, pass a dictionary. For example,

.. code-block:: python

@pytask.mark.produces({"first": BLD / "data_0.pkl", "second": BLD / "data_1.pkl"})
def task_create_random_data(produces):
...

Then, use

.. code-block:: pycon

>>> produces["first"]
BLD / "data_0.pkl"

>>> produces["second"]
BLD / "data_1.pkl"

inside the task function.

You can also use lists and other iterables.

.. code-block:: python

Expand All @@ -102,8 +124,9 @@ where keys are the positions in the list.
>>> produces
{0: BLD / "data_0.pkl", 1: BLD / "data_1.pkl"}

Why dictionaries and not lists? First, dictionaries with positions as keys behave very
similar to lists and conversion between both is easy.
Why does pytask recommend dictionaries and even converts lists to dictionaries? First,
dictionaries with positions as keys behave very similar to lists and conversion between
both is easy.

.. tip::

Expand All @@ -113,47 +136,100 @@ Secondly, dictionaries use keys instead of positions which is more verbose and
descriptive and does not assume a fixed ordering. Both attributes are especially
desirable in complex projects.

To assign labels to dependencies or products, pass a dictionary. For example,

Multiple decorators
-------------------

You can also attach multiple decorators to a function which will be merged into a single
dictionary. This might help you to group certain dependencies and apply them to multiple
tasks.

.. code-block:: python

@pytask.mark.produces({"first": BLD / "data_0.pkl", "second": BLD / "data_1.pkl"})
def task_create_random_data(produces):
common_dependencies = ["text_1.txt", "text_2.txt"]


@pytask.mark.depends_on(common_dependencies)
@pytask.mark.depends_on("text_3.txt")
def task_example():
...

Then, use

.. code-block:: pycon
Nested dependencies and products
--------------------------------

>>> produces["first"]
BLD / "data_0.pkl"
Dependencies and products are allowed to be nested containers consisting of tuples,
lists, and dictionaries. In situations where you want more structure and are otherwise
forced to flatten your inputs, this can be beneficial.

>>> produces["second"]
BLD / "data_1.pkl"
Here is an example with a task which fits some model on data. It depends on a module
containing the code for the model which is not actively used, but ensures that the task
is rerun when the model is changed. And, it depends on data.

inside the task function.
.. code-block:: python

@pytask.mark.depends_on(
{
"model": [SRC / "models" / "model.py"],
"data": {"a": SRC / "data" / "a.pkl", "b": SRC / "data" / "b.pkl"},
}
)
@pytask.mark.produces(BLD / "models" / "fitted_model.pkl")
def task_fit_model():
...

Multiple decorators
-------------------
It is also possible to merge nested containers. For example, you might want to reuse
the dependency on models for other tasks as well.

You can also attach multiple decorators to a function which will be merged into a single
dictionary. This might help you to group certain dependencies and apply them to multiple
tasks.
.. code-block:: python

model_dependencies = pytask.mark.depends_on({"model": [SRC / "models" / "model.py"]})


@model_dependencies
@pytask.mark.depends_on(
{"data": {"a": SRC / "data" / "a.pkl", "b": SRC / "data" / "b.pkl"}}
)
@pytask.mark.produces(BLD / "models" / "fitted_model.pkl")
def task_fit_model():
...

In both cases, ``depends_on`` within the function will be

.. code-block:: python

common_dependencies = ["text_1.txt", "text_2.txt"]
{
"model": [SRC / "models" / "model.py"],
"data": {"a": SRC / "data" / "a.pkl", "b": SRC / "data" / "b.pkl"},
}

Tuples and lists are converted to dictionaries with integer keys. The innermost
decorator is evaluated first.

@pytask.mark.depends_on(common_dependencies)
@pytask.mark.depends_on("text_3.txt")
def task_example():
.. code-block:: python

@pytask.mark.depends_on([SRC / "models" / "model.py"])
@pytask.mark.depends_on([SRC / "data" / "a.pkl", SRC / "data" / "b.pkl"])
@pytask.mark.produces(BLD / "models" / "fitted_model.pkl")
def task_fit_model():
...

would give

.. code-block:: python

{0: SRC / "data" / "a.pkl", 1: SRC / "data" / "b.pkl", 2: SRC / "models" / "model.py"}

.. seealso::

The general concept behind nested objects like tuples, lists, and dictionaries is
called pytrees and is more extensively explained in the `documentation of pybaum
<https://github.com/OpenSourceEconomics/pybaum>`_ which serves pytask under the
hood.


References
----------

.. [1] The official documentation for :mod:`pathlib`.
.. [2] A guide for pathlib at `RealPython <https://realpython.com/python-pathlib/>`_.
.. [2] A guide for pathlib by `realpython <https://realpython.com/python-pathlib/>`_.
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ dependencies:
- networkx
- pluggy
- pony >=0.7.15
- pybaum
- rich
- typing-extensions

Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ install_requires =
packaging
pluggy
pony>=0.7.15
pybaum
rich
typing-extensions
python_requires = >=3.7
Expand Down
6 changes: 4 additions & 2 deletions src/_pytask/clean.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from typing import Any
from typing import Generator
from typing import Iterable
from typing import List
from typing import TYPE_CHECKING

import attr
Expand All @@ -26,6 +27,7 @@
from _pytask.session import Session
from _pytask.shared import get_first_non_none_value
from _pytask.traceback import render_exc_info
from pybaum.tree_util import tree_just_yield


if TYPE_CHECKING:
Expand Down Expand Up @@ -190,7 +192,7 @@ def _yield_paths_from_task(task: MetaTask) -> Generator[Path, None, None]:
"""Yield all paths attached to a task."""
yield task.path
for attribute in ["depends_on", "produces"]:
for node in getattr(task, attribute).values():
for node in tree_just_yield(getattr(task, attribute)):
if hasattr(node, "path") and isinstance(node.path, Path):
yield node.path

Expand Down Expand Up @@ -234,7 +236,7 @@ class _RecursivePathNode:
"""

path = attr.ib(type=Path)
sub_nodes = attr.ib(type="list[_RecursivePathNode]")
sub_nodes = attr.ib(type=List["_RecursivePathNode"])
is_dir = attr.ib(type=bool)
is_file = attr.ib(type=bool)
is_unknown = attr.ib(type=bool)
Expand Down
18 changes: 9 additions & 9 deletions src/_pytask/collect_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from _pytask.path import relative_to
from _pytask.pluginmanager import get_plugin_manager
from _pytask.session import Session
from pybaum.tree_util import tree_just_flatten
from rich.text import Text
from rich.tree import Tree

Expand Down Expand Up @@ -125,13 +126,8 @@ def _find_common_ancestor_of_all_nodes(
for task in tasks:
all_paths.append(task.path)
if show_nodes:
all_paths.extend(
[
node.path
for attr in ("depends_on", "produces")
for node in getattr(task, attr).values()
]
)
all_paths.extend(map(lambda x: x.path, tree_just_flatten(task.depends_on)))
all_paths.extend(map(lambda x: x.path, tree_just_flatten(task.produces)))

common_ancestor = find_common_ancestor(*all_paths, *paths)

Expand Down Expand Up @@ -201,7 +197,9 @@ def _print_collected_tasks(
)

if show_nodes:
for node in sorted(task.depends_on.values(), key=lambda x: x.path):
for node in sorted(
tree_just_flatten(task.depends_on), key=lambda x: x.path
):
reduced_node_name = relative_to(node.path, common_ancestor)
url_style = create_url_style_for_path(node.path, editor_url_scheme)
task_branch.add(
Expand All @@ -213,7 +211,9 @@ def _print_collected_tasks(
)
)

for node in sorted(task.produces.values(), key=lambda x: x.path):
for node in sorted(
tree_just_flatten(task.produces), key=lambda x: x.path
):
reduced_node_name = relative_to(node.path, common_ancestor)
url_style = create_url_style_for_path(node.path, editor_url_scheme)
task_branch.add(
Expand Down
9 changes: 2 additions & 7 deletions src/_pytask/execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
from _pytask.traceback import format_exception_without_traceback
from _pytask.traceback import remove_traceback_from_exc_info
from _pytask.traceback import render_exc_info
from pybaum.tree_util import tree_map
from rich.text import Text


Expand Down Expand Up @@ -157,13 +158,7 @@ def pytask_execute_task(task: MetaTask) -> bool:
for arg_name in ("depends_on", "produces"):
if arg_name in func_arg_names:
attribute = getattr(task, arg_name)
kwargs[arg_name] = (
attribute[0].value
if len(attribute) == 1
and 0 in attribute
and not task.keep_dict[arg_name]
else {name: node.value for name, node in attribute.items()}
)
kwargs[arg_name] = tree_map(lambda x: x.value, attribute)

task.execute(**kwargs)
return True
Expand Down
Loading