Skip to content

Implement a new loop-based approach to parametrizations. #229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Mar 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
e66bf07
Implement a couple of tests and add to changes.
tobiasraabe Feb 27, 2022
5390091
Implement functionality to make tests pass.
tobiasraabe Feb 28, 2022
aca9f60
Fix errors.
tobiasraabe Mar 1, 2022
9cc8007
Make name of task deco positional, rest kwargs, and add test for id.
tobiasraabe Mar 1, 2022
2819db6
Implement id as a kwarg to the task marker.
tobiasraabe Mar 1, 2022
ab69519
Implement parsing of args where id components are stringified.
tobiasraabe Mar 1, 2022
29c4170
Add task testing error handling.
tobiasraabe Mar 1, 2022
93e52d0
fix tests.
tobiasraabe Mar 1, 2022
8b296a2
Extend docstring.
tobiasraabe Mar 1, 2022
3636716
Add tabulate to environment.
hmgaudecker Mar 1, 2022
d1906ec
Rewrite tutorial and move explanation on @pytask.mark.parametrize to …
tobiasraabe Mar 3, 2022
7581871
Update the best-practices guide on parametrizations:
tobiasraabe Mar 3, 2022
ba912f4
Fix link error.
tobiasraabe Mar 3, 2022
b7975be
fix.
tobiasraabe Mar 3, 2022
ebb5dc9
fix.
tobiasraabe Mar 3, 2022
69054b9
Merge branch 'implement-new-parametrizations' of https://github.com/p…
tobiasraabe Mar 3, 2022
07994af
Integrate comments from HMG.
tobiasraabe Mar 3, 2022
7084b11
Clarify returnining None in the wrapper function.
tobiasraabe Mar 3, 2022
1d063b4
Remove function type with pytask_meta defined.
tobiasraabe Mar 3, 2022
e60574c
Add a test with irregular dicts which causes a TypeError during execu…
tobiasraabe Mar 3, 2022
58b9365
Highlight parametrizations via loops in README.rst.
tobiasraabe Mar 3, 2022
a511aad
Merge branch 'main' into implement-new-parametrizations
tobiasraabe Mar 7, 2022
caaaa91
Rerun ci.
tobiasraabe Mar 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ projects. Its features include:
<https://pytask-dev.readthedocs.io/en/stable/tutorials/how_to_debug.html>`_ if a task
fails, get feedback quickly, and be more productive.

- **Parametrizations via loops.** `Loop over task functions
<https://pytask-dev.readthedocs.io/en/stable/tutorials/how_to_parametrize_a_task.html>`_
to run the same task with different inputs.

- **Select tasks via expressions.** Run only a subset of tasks with `expressions and
marker expressions
<https://pytask-dev.readthedocs.io/en/stable/tutorials/how_to_select_tasks.html>`_
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/source/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ all releases are available on `PyPI <https://pypi.org/project/pytask>`_ and
parametrized arguments to the task class.
- :pull:`228` removes ``task.pytaskmark`` and moves the information to
:attr:`_pytask.models.CollectionMetadata.markers`.
- :pull:`229` implements a new loop-based approach to parametrizations using the
:func:`@pytask.mark.task <_pytask.task.task>` decorator.
- :pull:`230` implements :class:`_pytask.logging._TimeUnit` as a
:class:`typing.NamedTuple` for better typing.

Expand Down
49 changes: 27 additions & 22 deletions docs/source/how_to_guides/bp_parametrizations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,26 +106,27 @@ parametrization.


def _create_parametrization(data):
parametrizations = []
ids = []
id_to_kwargs = {}
for data_name in data:
ids.append(data_name)
depends_on = path_to_input_data(data_name)
produces = path_to_processed_data(data_name)
parametrizations.append((depends_on, produces))

return "depends_on, produces", parametrizations, ids
id_to_kwargs[data_name] = {"depends_on": depends_on, "produces": produces}

return id_to_kwargs

_SIGNATURE, _PARAMETRIZATION, _IDS = _create_parametrization(DATA)

_ID_TO_KWARGS = _create_parametrization(DATA)

@pytask.mark.parametrize(_SIGNATURE, _PARAMETRIZATION, ids=_IDS)
def task_prepare_data(depends_on, produces):
...
for id_, kwargs in _ID_TO_KWARGS.items():

All arguments for the ``parametrize`` decorator are built within a function to keep the
logic in one place and the namespace of the module clean.
@pytask.mark.task(id=id_, kwargs=kwargs)
def task_prepare_data(depends_on, produces):
...

All arguments for the loop and the :func:`@pytask.mark.task <_pytask.task_utils.task>`
decorator are built within a function to keep the logic in one place and the namespace
of the module clean.

Ids are used to make the task :ref:`ids <ids>` more descriptive and to simplify their
selection with :ref:`expressions <expressions>`. Here is an example of the task ids with
Expand Down Expand Up @@ -183,25 +184,29 @@ And, here is the task file.


def _create_parametrization(estimations):
parametrizations = []
ids = []
id_to_kwargs = {}
for name, config in estimations.items():
ids.append(name)
depends_on = path_to_processed_data(config["data"])
produces = path_to_estimation_result(name)
parametrizations.append((depends_on, config["model"], produces))

return "depends_on, model, produces", parametrizations, ids
id_to_kwargs[name] = {
"depends_on": depends_on,
"model": config["model"],
"produces": produces,
}

return id_to_kwargs

_SIGNATURE, _PARAMETRIZATION, _IDS = _create_parametrization(ESTIMATIONS)

_ID_TO_KWARGS = _create_parametrization(ESTIMATIONS)

@pytask.mark.parametrize(_SIGNATURE, _PARAMETRIZATION, ids=_IDS)
def task_estmate_models(depends_on, model, produces):
if model == "linear_probability":
...
...

for id_, kwargs in _ID_TO_KWARGS.items():

@pytask.mark.task(id=id_, kwargs=kwars)
def task_estmate_models(depends_on, model, produces):
if model == "linear_probability":
...

Replicating this pattern across a project allows for a clean way to define
parametrizations.
Expand Down
225 changes: 225 additions & 0 deletions docs/source/how_to_guides/how_to_parametrize_a_task_the_pytest_way.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
How to parametrize a task - The pytest way
==========================================

You want to define a task which should be repeated over a range of inputs? Parametrize
your task function!

.. important::

This guide shows you how to parametrize tasks with the pytest approach. For the new
and preferred approach, see this :doc:`tutorial
<../tutorials/how_to_parametrize_a_task>`.

You want to define a task which should be repeated over a range of inputs? Parametrize
your task function!

.. seealso::

If you want to know more about best practices for parametrizations, check out this
:doc:`guide <../how_to_guides/bp_parametrizations>` after you made yourself familiar
this tutorial.


An example
----------

We reuse the previous example of a task which generates random data and repeat the same
operation over a number of seeds to receive multiple, reproducible samples.

First, we write the task for one seed.

.. code-block:: python

import numpy as np
import pytask


@pytask.mark.produces(BLD / "data_0.pkl")
def task_create_random_data(produces):
rng = np.random.default_rng(0)
...

In the next step, we repeat the same task over the numbers 0, 1, and 2 and pass them to
the ``seed`` argument. We also vary the name of the produced file in every iteration.

.. code-block:: python

@pytask.mark.parametrize(
"produces, seed",
[(BLD / "data_0.pkl", 0), (BLD / "data_1.pkl", 1), (BLD / "data_2.pkl", 2)],
)
def task_create_random_data(seed, produces):
rng = np.random.default_rng(seed)
...

The parametrize decorator receives two arguments. The first argument is ``"produces,
seed"`` - the signature. It is a comma-separated string where each value specifies the
name of a task function argument.

.. seealso::

The signature is explained in detail :ref:`below <parametrize_signature>`.

The second argument of the parametrize decorator is a list (or any iterable) which has
as many elements as there are iterations over the task function. Each element has to
provide one value for each argument name in the signature - two in this case.

Putting all together, the task is executed three times and each run the path from the
list is mapped to the argument ``produces`` and ``seed`` receives the seed.

.. note::

If you use ``produces`` or ``depends_on`` in the signature of the parametrize
decorator, the values are handled as if they were attached to the function with
``@pytask.mark.depends_on`` or ``@pytask.mark.produces``.

Un-parametrized dependencies
----------------------------

To specify a dependency which is the same for all parametrizations, add it with
``pytask.mark.depends_on``.

.. code-block:: python

@pytask.mark.depends_on(SRC / "common_dependency.file")
@pytask.mark.parametrize(
"produces, seed",
[(BLD / "data_0.pkl", 0), (BLD / "data_1.pkl", 1), (BLD / "data_2.pkl", 2)],
)
def task_create_random_data(seed, produces):
rng = np.random.default_rng(seed)
...


.. _parametrize_signature:

The signature
-------------

The signature can be passed in three different formats.

1. The signature can be a comma-separated string like an entry in a csv table. Note that
white-space is stripped from each name which you can use to separate the names for
readability. Here are some examples:

.. code-block:: python

"single_argument"
"first_argument,second_argument"
"first_argument, second_argument"

2. The signature can be a tuple of strings where each string is one argument name. Here
is an example.

.. code-block:: python

("first_argument", "second_argument")

3. Finally, it is also possible to use a list of strings.

.. code-block:: python

["first_argument", "second_argument"]


The id
------

Every task has a unique id which can be used to :doc:`select it <how_to_select_tasks>`.
The normal id combines the path to the module where the task is defined, a double colon,
and the name of the task function. Here is an example.

.. code-block::

../task_example.py::task_example

This behavior would produce duplicate ids for parametrized tasks. Therefore, there exist
multiple mechanisms to produce unique ids.


.. _auto_generated_ids:

Auto-generated ids
~~~~~~~~~~~~~~~~~~

To avoid duplicate task ids, the ids of parametrized tasks are extended with
descriptions of the values they are parametrized with. Booleans, floats, integers and
strings enter the task id directly. For example, a task function which receives four
arguments, ``True``, ``1.0``, ``2``, and ``"hello"``, one of each dtype, has the
following id.

.. code-block::

task_example.py::task_example[True-1.0-2-hello]

Arguments with other dtypes cannot be easily converted to strings and, thus, are
replaced with a combination of the argument name and the iteration counter.

For example, the following function is parametrized with tuples.

.. code-block:: python

@pytask.mark.parametrized("i", [(0,), (1,)])
def task_example(i):
pass

Since the tuples are not converted to strings, the ids of the two tasks are

.. code-block::

task_example.py::task_example[i0]
task_example.py::task_example[i1]


User-defined ids
~~~~~~~~~~~~~~~~

Instead of a function, you can also pass a list or another iterable of id values via
``ids``.

This code

.. code-block:: python

@pytask.mark.parametrized("i", [(0,), (1,)], ids=["first", "second"])
def task_example(i):
pass

produces these ids

.. code-block::

task_example.py::task_example[first] # (0,)
task_example.py::task_example[second] # (1,)


.. _how_to_parametrize_a_task_convert_other_objects:

Convert other objects
~~~~~~~~~~~~~~~~~~~~~

To change the representation of tuples and other objects, you can pass a function to the
``ids`` argument of the :func:`@pytask.mark.parametrize
<_pytask.parametrize.parametrize>` decorator. The function is called for every argument
and may return a boolean, number, or string which will be integrated into the id. For
every other return, the auto-generated value is used.

To get a unique representation of a tuple, we can use the hash value.

.. code-block:: python

def tuple_to_hash(value):
if isinstance(value, tuple):
return hash(a)


@pytask.mark.parametrized("i", [(0,), (1,)], ids=tuple_to_hash)
def task_example(i):
pass

This produces the following ids:

.. code-block::

task_example.py::task_example[3430018387555] # (0,)
task_example.py::task_example[3430019387558] # (1,)
1 change: 1 addition & 0 deletions docs/source/how_to_guides/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ specific tasks with pytask.

how_to_write_a_plugin
how_to_influence_build_order
how_to_parametrize_a_task_the_pytest_way


Best Practice Guides
Expand Down
Loading