diff --git a/README.rst b/README.rst index 5a82f40c..f4b6b721 100644 --- a/README.rst +++ b/README.rst @@ -32,8 +32,7 @@ .. end-badges -Features --------- +.. start-features In its highest aspirations, pytask tries to be pytest as a build system. It's main purpose is to facilitate reproducible research by automating workflows in research @@ -63,6 +62,8 @@ projects. Its features include: `_ how you can use plugins. +.. end-features + Why do I need a build system? ----------------------------- diff --git a/docs/changes.rst b/docs/changes.rst index 369d225f..9f5d35bc 100644 --- a/docs/changes.rst +++ b/docs/changes.rst @@ -12,6 +12,7 @@ all releases are available on `Anaconda.org - :gh:`55` implements miscellaneous fixes to improve error message, tests and coverage. - :gh:`59` adds a tutorial on using plugins and features plugins more prominently. - :gh:`60` adds the MIT license to the project and mentions pytest and its developers. +- :gh:`61` adds many changes to the documentation. - :gh:`65` adds versioneer to pytask and :gh:`66` corrects the coverage reports which were deflated due to the new files. diff --git a/docs/explanations/why_another_build_system.rst b/docs/explanations/build_systems.rst similarity index 91% rename from docs/explanations/why_another_build_system.rst rename to docs/explanations/build_systems.rst index 08d500e5..12fd8333 100644 --- a/docs/explanations/why_another_build_system.rst +++ b/docs/explanations/build_systems.rst @@ -1,5 +1,8 @@ +Build Systems +============= + Why another build system? -========================= +------------------------- There are a lot of build systems out there with existing communities who accumulated a lot of experience over time. So why bother creating another build system? @@ -96,17 +99,3 @@ Pros Cons - Seems to have no plugin system. - - -`cook `_ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Pros - -- Still simple, maybe useful for some quick inspirations. - -Cons - -- Development is paused. -- Designed for compiling software. -- No plugin system, but extensible interfaces. diff --git a/docs/explanations/design.rst b/docs/explanations/design.rst deleted file mode 100644 index c5e09ead..00000000 --- a/docs/explanations/design.rst +++ /dev/null @@ -1,12 +0,0 @@ -Design -====== - -The design of pytask has some key objectives. - -1. The interface must be simple, easy-to-learn, and may have synergies with pytest. It - is important that even users without a strong background in computer science or - programming are able to use pytask. - -2. pytask must be easily extensible via plugins. Developers of pytask are naturally - unaware of all the possible applications of a build system. Thus, they must focus on - the host application and the design of the entry-points. diff --git a/docs/explanations/index.rst b/docs/explanations/index.rst index 69594e36..b89ee302 100644 --- a/docs/explanations/index.rst +++ b/docs/explanations/index.rst @@ -8,6 +8,5 @@ systems in general as well as its design. :maxdepth: 1 why_do_i_need_a_build_system - why_another_build_system - design + build_systems pluggy diff --git a/docs/explanations/why_do_i_need_a_build_system.rst b/docs/explanations/why_do_i_need_a_build_system.rst index 5382bb2e..8242f0e7 100644 --- a/docs/explanations/why_do_i_need_a_build_system.rst +++ b/docs/explanations/why_do_i_need_a_build_system.rst @@ -7,8 +7,8 @@ TL;DR Research projects consists of complex workflows which handle data, employ models, and produce figures, tables, and reports. -Making sure that all steps of the analysis are up-to-date should not be done by hand -since this process is error-prone and time-consuming. +Ensuring that all steps of the analysis are up-to-date should not be done by hand since +this process is error-prone and time-consuming. Build systems like pytask provide an easy interface for researchers to express the relationships among the tasks in a research project and conveniently manage the diff --git a/docs/index.rst b/docs/index.rst index 2e8d4551..6af6d42f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -6,13 +6,26 @@ pytask :end-before: end-badges +Features +-------- + +.. include:: ../README.rst + :start-after: start-features + :end-before: end-features + + +Installation +------------ + .. include:: ../README.rst :start-after: start-installation :end-before: end-installation -The documentation is build upon four pillars: +Documentation +------------- +If you want to know more about pytask, dive into one the following topics. .. panels:: :container: container pb-4 @@ -28,8 +41,7 @@ The documentation is build upon four pillars: :text: Tutorials :classes: stretched-link font-weight-bold - Tutorials help you to get started with pytask, explain the interface and basic - capabilities. + Tutorials help you to get started with pytask and how you manage your first project. --- :img-top: _static/images/book.svg @@ -39,8 +51,8 @@ The documentation is build upon four pillars: :text: How-to Guides :classes: stretched-link font-weight-bold - How-to guides are designed to provide detailed instructions for very specific and - advanced tasks. + How-to guides provide instructions for very specific and advanced tasks and document + best-practices. --- :img-top: _static/images/books.svg @@ -50,8 +62,7 @@ The documentation is build upon four pillars: :text: Explanations :classes: stretched-link font-weight-bold - Explanations give detailed information on key topics and concepts which underlie the - package. + Explanations deal with key topics and concepts which underlie the package. --- :img-top: _static/images/coding.svg diff --git a/docs/reference_guides/marks.rst b/docs/reference_guides/marks.rst index 18a3ceee..feaf67d5 100644 --- a/docs/reference_guides/marks.rst +++ b/docs/reference_guides/marks.rst @@ -24,3 +24,12 @@ pytask.mark.parametrize .. autofunction:: _pytask.parametrize.parametrize :noindex: + + +pytask.mark.try_first +--------------------- + +.. function:: try_first + :noindex: + + This diff --git a/docs/rtd_environment.yml b/docs/rtd_environment.yml index 56c9ea65..039bebc2 100644 --- a/docs/rtd_environment.yml +++ b/docs/rtd_environment.yml @@ -2,25 +2,26 @@ channels: - conda-forge dependencies: - - python=3.8 + - python >= 3.6 - pip - furo - ipython + - nbsphinx - sphinx - - sphinx-copybutton - - sphinx-autodoc-typehints - sphinx-autoapi + - sphinx-autodoc-typehints - sphinx-click + - sphinx-copybutton - sphinx-panels # Package dependencies necessary for sphinx-click - - attrs + - attrs >=17.4.0 - click - click-default-group - networkx - - pexpect - pluggy - - pony >= 0.7.13 + - pony >=0.7.13 + - pexpect - pip: - -e ../ diff --git a/docs/tutorials/how_to_debug.rst b/docs/tutorials/how_to_debug.rst index 42b37fd2..f5ea1f0f 100644 --- a/docs/tutorials/how_to_debug.rst +++ b/docs/tutorials/how_to_debug.rst @@ -1,18 +1,20 @@ How to debug ============ -To facilitate debugging, pytask offers two command-line options. +The debug mode is one of pytask's biggest strength. Whenever you encounter an error in +one of your tasks, jump right into the code and inspect the cause of the exception. -.. tip:: +Quick and easy feedback through a debugger is immensely valuable because it helps you to +be more productive and gain more confidence in your code. - Instead of Python's :mod:`pdb`, use `pdb++ `_ which - is more convenient, colorful has some useful features like the `sticky mode - `_. +To facilitate debugging, pytask offers two command-line options. Debugging --------- +Running + .. code-block:: console $ pytask --pdb @@ -20,6 +22,17 @@ Debugging enables the post-mortem debugger. Whenever an exception is raised inside a task, the prompt will enter the debugger enabling you to discover the source of the exception. +.. seealso:: + + :doc:`A following tutorial ` shows you how to run only one or a + subset of tasks which can be combined with the debug mode. + +.. tip:: + + Instead of Python's :mod:`pdb`, use `pdb++ `_ which + is more convenient, colorful has some useful features like the `sticky mode + `_. + Tracing ------- diff --git a/docs/tutorials/how_to_define_dependencies_products.rst b/docs/tutorials/how_to_define_dependencies_products.rst index d0b9035d..c20f0b79 100644 --- a/docs/tutorials/how_to_define_dependencies_products.rst +++ b/docs/tutorials/how_to_define_dependencies_products.rst @@ -1,115 +1,130 @@ How to define dependencies and products ======================================= -Task have dependencies and products. Both can be attached to a task function with -decorators. This is necessary so that pytask knows when a task is able to run, needs to -be run again or has produced the desired outcome. +To make sure pytask executes all tasks in a correct order, we need to define which +dependencies are required and which products are produced by a task. -Let us have a look at some examples. +The information on dependencies and products can be attached to a task function with +special markers. Let us have a look at some examples. Products -------- -We take the task from the previous tutorial. +We first focus on products which we already encountered in the previous tutorial. Let us +take the task from the previous tutorial. .. code-block:: python import pytask - @pytask.mark.produces("hello_earth.txt") - def task_hello_earth(produces): - produces.write_text("Hello, earth!") - -The ``@pytask.mark.produces`` decorator attaches a product to a task. The string -``"hello_earth.txt"`` is converted to a :class:`pathlib.Path`. - -.. note:: + @pytask.mark.produces(BLD / "data.pkl") + def task_create_random_data(produces): + ... - If you do not know about :mod:`pathlib` check out [1]_ and [2]_. The module is very - useful to handle paths conveniently and cross-platform. +The ``@pytask.mark.produces`` marker attaches a product to a task which is a +:class:`pathlib.Path` to file. After the task has finished, pytask will check whether +the file exists. -.. important:: +Optionally, you can use ``produces`` as an argument of the task function and get access +to the same path inside the task function. - Here are the rules to parse a path. +.. tip:: - 1. Paths can either be strings or :class:`pathlib.Path`. - 2. A string is converted to :class:`pathlib.Path`. - 3. If the path is relative, it is assumed to be relative to the directory where the - task is defined. + If you do not know about :mod:`pathlib` check out [1]_ and [2]_. The module is very + useful to handle paths conveniently and across platforms. Dependencies ------------ Most tasks have dependencies. Similar to products, you can use the -``@pytask.mark.depends_on`` decorator to attach a dependency to a task. +``@pytask.mark.depends_on`` marker to attach a dependency to a task. .. code-block:: python - @pytask.mark.depends_on("text.txt") - @pytask.mark.produces("bold_text.txt") - def task_make_text_bold(depends_on, produces): - text = depends_on.read_text() - bold_text = f"**{text}**" - produces.write_text(bold_text) + @pytask.mark.depends_on(BLD / "data.pkl") + @pytask.mark.produces(BLD / "plot.png") + def task_plot_data(depends_on, produces): + ... + +Use ``depends_on`` as a function argument to work with the path of the dependency and, +for example, load the data. + + +Conversion +---------- + +Dependencies and products do not have to be absolute paths. If paths are relative, they +are assumed to point to a location relative to the task module. + +You can also use absolute and relative paths as strings which obey the same rules as the +:class:`pathlib.Path`. +.. code-block:: python -Optional usage in signature ---------------------------- + @pytask.mark.produces("../bld/data.pkl") + def task_create_random_data(produces): + ... -As seen before, if you have a task with products (or dependencies), you can use -``produces`` (``depends_on``) as a function argument and receive the path or a -dictionary of paths inside the functions. It helps to avoid repetition. +If you use ``depends_on`` or ``produces`` as arguments for the task function, you will +have access to the paths of the targets as :class:`pathlib.Path` even if strings were +used before. Multiple dependencies and products ---------------------------------- Most tasks have multiple dependencies or products. The easiest way to attach multiple -dependencies or products to a task is to pass a :class:`list`, :class:`tuple` or other -iterator to the decorator which contains :class:`str` or :class:`pathlib.Path`. +dependencies or products to a task is to pass a :class:`dict`, :class:`list` or another +iterator to the marker containing the paths. .. code-block:: python - @pytask.mark.depends_on(["text_1.txt", "text_2.txt"]) - def task_example(depends_on): - pass + @pytask.mark.produces([BLD / "data_0.pkl", BLD / "data_1.pkl"]) + def task_create_random_data(produces): + ... -The function argument ``depends_on`` or ``produces`` becomes a dictionary where keys are -the positions in the list and values are :class:`pathlib.Path`. +Inside the function, the arguments ``depends_on`` or ``produces`` become a dictionary +where keys are the positions in the list. -.. code-block:: python +.. code-block:: pycon - depends_on = {0: Path("text_1.txt"), 1: Path("text_2.txt")} + >>> produces + {0: BLD / "data_0.pkl", 1: BLD / "data_1.pkl"} Why dictionaries and not lists? First, dictionaries with positions as keys behave very similar to lists and conversion between both is easy. -Secondly, dictionaries allow to access paths to dependencies and products via labels -which is preferred over positional access when tasks become more complex and the order -changes. - -To assign labels to dependencies or products, pass a dictionary or a list of tuples with -the name in the first and the path in the second position to the decorator. For example, +.. tip:: -.. code-block:: python + Use ``list(produces.values())`` to convert a dictionary to a list. - @pytask.mark.depends_on({"first": "text_1.txt", "second": "text_2.txt"}) - @pytask.mark.produces("out.txt") - def task_example(depends_on, produces): - text = depends_on["first"].read_text() + " " + depends_on["second"].read_text() - produces.write_text(text) +Secondly, dictionaries use keys instead of positions which is more verbose and +descriptive and does not assume a fixed ordering. Both attributes are especially +desirable in complex projects. -or with tuples +To assign labels to dependencies or products, pass a dictionary. For example, .. code-block:: python - @pytask.mark.depends_on([("first", "text_1.txt"), ("second", "text_2.txt")]) - def task_example(): + @pytask.mark.produces({"first": BLD / "data_0.pkl", "second": BLD / "data_1.pkl"}) + def task_create_random_data(produces): ... +Then, use + +.. code-block:: pycon + + >>> produces["first"] + BLD / "data_0.pkl" + + >>> produces["second"] + BLD / "data_1.pkl" + +inside the task function. + Multiple decorators ------------------- diff --git a/docs/tutorials/how_to_parametrize_a_task.rst b/docs/tutorials/how_to_parametrize_a_task.rst index 2d13cfb9..d89f54ae 100644 --- a/docs/tutorials/how_to_parametrize_a_task.rst +++ b/docs/tutorials/how_to_parametrize_a_task.rst @@ -4,59 +4,59 @@ How to parametrize a task Often, you want to define a task which should be repeated over a range of inputs. pytask allows to parametrize task functions to accomplish this behavior. + An example ---------- -Let us focus on a simple example. In this setting, we want to define a task which -receives a number and saves it to a file. This task should be repeated for the numbers -from 0 to 2. +We reuse the previous example of a task which generates random data and repeat the same +operation over a number of seeds to receive multiple, reproducible samples. -First, we write the task for one number. +First, we write the task for one seed. .. code-block:: python + import numpy as np import pytask - @pytask.mark.produces("0.txt") - def task_save_number(produces, i=0): - produces.write_text(str(i)) + @pytask.mark.produces(BLD / "data_0.pkl") + def task_create_random_data(produces): + np.random.seed(0) + ... -In the next step, we parametrize the task by varying ``i``. +In the next step, we repeat the same task over the numbers 0, 1, and 2 and pass them to +the ``seed`` argument. We also vary the name of the produced file in every iteration. .. code-block:: python - @pytask.mark.parametrize("produces, i", [("0.txt", 0), ("1.txt", 1), ("2.txt", 2)]) - def task_save_number(produces, i): - produces.write_text(str(i)) + @pytask.mark.parametrize( + "produces, seed", + [(BLD / "data_0.pkl", 0), (BLD / "data_1.pkl", 1), (BLD / "data_2.pkl", 2)], + ) + def task_create_random_data(produces): + np.random.seed(0) + ... -The parametrize decorator receives two arguments. The first argument is ``produces, i`` -- the signature. It is a comma-separated string where each value specifies the name of a -task function argument. +The parametrize decorator receives two arguments. The first argument is ``"produces, +seed"`` - the signature. It is a comma-separated string where each value specifies the +name of a task function argument. .. seealso:: The signature is explained in detail :ref:`below `. -The second argument of the parametrize decorator is an iterable. Each entry in iterable -has to provide one value for each argument name in the signature. +The second argument of the parametrize decorator is a list (or any iterable) which has +as many elements as there are iterations over the task function. Each element has to +provide one value for each argument name in the signature - two in this case. Putting all together, the task is executed three times and each run the path from the -list is mapped to the argument ``produces`` and ``i`` receives the number. +list is mapped to the argument ``produces`` and ``seed`` receives the seed. -.. important:: +.. note:: If you use ``produces`` or ``depends_on`` in the signature of the parametrize - decorator, the values are automatically treated as if they were attached to the - function with ``@pytask.mark.depends_on`` or ``@pytask.mark.produces``. For - example, the generated task in which ``i = 1`` is identical to - - .. code-block:: python - - @pytask.mark.produces("1.txt") - def task_save_number(produces, i=1): - produces.write_text(str(i)) - + decorator, the values are treated as if they were attached to the function with + ``@pytask.mark.depends_on`` or ``@pytask.mark.produces``. Un-parametrized dependencies ---------------------------- @@ -66,11 +66,14 @@ To specify a dependency which is the same for all parametrizations, add it with .. code-block:: python - @pytask.mark.depends_on(Path("additional_text.txt")) - @pytask.mark.parametrize("produces, i", [("0.txt", 0), ("1.txt", 1), ("2.txt", 2)]) - def task_save_number(depends_on, produces, i): - additional_text = depends_on.read_text() - produces.write_text(additional_text + str(i)) + @pytask.mark.depends_on(SRC / "common_dependency.file") + @pytask.mark.parametrize( + "produces, seed", + [(BLD / "data_0.pkl", 0), (BLD / "data_1.pkl", 1), (BLD / "data_2.pkl", 2)], + ) + def task_create_random_data(produces): + np.random.seed(0) + ... .. _parametrize_signature: diff --git a/docs/tutorials/how_to_set_up_a_project.rst b/docs/tutorials/how_to_set_up_a_project.rst index a07c77b9..9a3004e8 100644 --- a/docs/tutorials/how_to_set_up_a_project.rst +++ b/docs/tutorials/how_to_set_up_a_project.rst @@ -3,8 +3,7 @@ How to set up a project ======================= -The previous sections in the tutorial explained the basic capabilities of pytask, but -how can we manage a bigger project with pytask? +This tutorial shows you how to set up your project. The directory structure @@ -25,8 +24,14 @@ The following directory tree is an example of how a project can be set up. └────... - The configuration file, ``pytask.ini``, ``tox.ini`` or ``setup.cfg``, should be placed - at the root of the project folder and should contain a ``[pytask]`` section even if it - is empty. + at the root of the project folder and should contain a ``[pytask]`` section which can + be left empty. + + .. code-block:: ini + + # Content of pytask.ini, tox.ini or setup.cfg. + + [pytask] The file in combination with the section will indicate the root of the project. This has two benefits. @@ -35,23 +40,14 @@ The following directory tree is an example of how a project can be set up. a ``.pytask.sqlite3`` database in the root folder. 2. Even if you start pytask from a different location inside the project folder than - the root, the database will be found. + the root, the database will be found and pytask runs as if it is run in the project + root. - Here is a configuration file without any information except the section header. +- The ``src`` directory contains the tasks and source files of the project. - .. code-block:: ini - - # Content of pytask.ini, tox.ini or setup.cfg. - - [pytask] - -- Then, there exist two folders. The ``src`` directory contains the tasks and other data - and code. - - It also contains a ``config.py`` or a similar module from where the project - configuration is read. You can store paths and other information which can be imported - in other files to specify dependencies and targets. Here is an example of a - ``config.py``. + It also contains a ``config.py`` or a similar module to store the configuration of the + project. For example, you should define paths pointing to the source and build + directory of the project. .. code-block:: python @@ -63,8 +59,9 @@ The following directory tree is an example of how a project can be set up. SRC = Path(__file__).parent BLD = SRC.joinpath("..", "bld").resolve() -- The build directory ``bld`` is used to store products of tasks. This makes it easy to - rerun the whole project by just deleting the entire build directory. +- The build directory ``bld`` is used to store products of tasks. The separation between + a source and build directory makes it easy to start from a clean project by deleting + the build directory. setup.py @@ -95,5 +92,9 @@ Then, install the package into your environment with $ pip install -e . Both commands will make an editable install of the project which means any changes in -the source files of the package are directly reflected in the installed version of the -package. +the source files of the package are reflected in the installed version of the package. + +.. tip:: + + Do not forget to rerun the editable install every time you recreate your Python + environment. diff --git a/docs/tutorials/how_to_write_a_task.rst b/docs/tutorials/how_to_write_a_task.rst index 1bc0e7ef..0948b22d 100644 --- a/docs/tutorials/how_to_write_a_task.rst +++ b/docs/tutorials/how_to_write_a_task.rst @@ -1,39 +1,69 @@ How to write a task =================== -A task is a function and is detected if the module and the function name are prefixed -with ``task_``. +Starting from the project structure in the :doc:`previous tutorial +`, this tutorial teaches you how to write your first task. -The following task :func:`task_hello_earth` lies in ``task_hello.py``. Its purpose is to -save the string ``"Hello, earth!"`` to a file called ``hello_earth.txt``. +The task will be defined in ``src/task_data_preparation.py`` and it will generate +artificial data which will be stored in ``bld/data.pkl``. We will call the function in +the module :func:`task_create_random_data`. + +.. code-block:: + + my_project + ├───pytask.ini or tox.ini or setup.cfg + │ + ├───src + │ ├────config.py + │ └────task_data_preparation.py + │ + └───bld + └────data.pkl + +Here, we define the function .. code-block:: python - # Content of task_hello.py. + # Content of task_data_preparation.py. import pytask + import numpy as np + import pandas as np + + from src.config import BLD + + + @pytask.mark.produces(BLD / "data.pkl") + def task_create_random_data(produces): + beta = 2 + x = np.random.normal(loc=5, scale=10, size=1_000) + epsilon = np.random.standard_normal(1_000) + y = beta * x + epsilon - @pytask.mark.produces("hello_earth.txt") - def task_hello_earth(produces): - produces.write_text("Hello, earth!") + df = pd.DataFrame({"x": x, "y": y}) + df.to_pickle(produces) -To let pytask track dependencies and products of tasks, you need to use the -``@pytask.mark.produces`` decorator. You learn how to add dependencies and products to a -task in the next :doc:`tutorial `. +To let pytask track the product of the task, you need to use the +``@pytask.mark.produces`` decorator. -To execute the task, type the following command on the command-line +.. seealso:: + + You learn more about adding dependencies and products to a task in the next + :doc:`tutorial `. + +To execute the task, type the following command in your shell. .. code-block:: console - $ pytask task_hello.py + $ pytask task_data_preparation.py ========================= Start pytask session ========================= Platform: linux -- Python 3.x.y, pytask 0.x.y, pluggy 0.x.y Root: xxx Collected 1 task(s). . - ====================== 1 succeeded in 1 second(s) ====================== + ======================= 1 succeeded in 1 second ======================== Executing @@ -41,4 +71,12 @@ Executing $ pytask -would collect all tasks in the current working directory and in all folders below. +would collect all tasks in the current working directory and in all subsequent folders. + +.. important:: + + By default, pytask assumes that tasks are functions in modules whose names are both + prefixed with ``task_``. + + Use the configuration value :confval:`task_files` if you prefer a different naming + scheme for the task modules. diff --git a/docs/tutorials/index.rst b/docs/tutorials/index.rst index 6ac51f76..97ac9d17 100644 --- a/docs/tutorials/index.rst +++ b/docs/tutorials/index.rst @@ -1,18 +1,17 @@ Tutorials ========= -Tutorials are written for new users and help them to get started. First, you learn some -basics of pytask's interface. The last sections provide you with the information to -organize and start your own project. +Tutorials take you by the hand through a series of steps to build and manage your first +project. Start here if you are a new user. .. toctree:: :maxdepth: 1 + how_to_set_up_a_project how_to_write_a_task how_to_define_dependencies_products how_to_debug how_to_parametrize_a_task - how_to_set_up_a_project how_to_configure_pytask how_to_select_tasks how_to_clean diff --git a/environment.yml b/environment.yml index dc621b98..e1069ffb 100644 --- a/environment.yml +++ b/environment.yml @@ -13,12 +13,12 @@ dependencies: - conda-verify # Package dependencies + - attrs >=17.4.0 - click - click-default-group - networkx - pluggy - - pony - - pytest + - pony >=0.7.13 # Misc - black @@ -26,7 +26,7 @@ dependencies: - matplotlib - pdbpp - pre-commit - - pydot + - pytest - pytest-cov - tox-conda - versioneer @@ -36,8 +36,8 @@ dependencies: - furo - nbsphinx - sphinx - - sphinx-copybutton - - sphinx-autodoc-typehints - sphinx-autoapi + - sphinx-autodoc-typehints - sphinx-click + - sphinx-copybutton - sphinx-panels