Skip to content

Add new command pytask dag to visualize the DAG. #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ all releases are available on `PyPI <https://pypi.org/project/pytask>`_ and
- :gh:`93` fixes the display of parametrized arguments in the console.
- :gh:`94` adds ``--show-locals`` which allows to print local variables in tracebacks.
- :gh:`96` implements a spinner to show the progress during the collection.
- :gh:`99` enables color support for WSL in Windows Terminal and fixes ``show_locals``
in ``collect.py``.
- :gh:`99` enables color support in WSL and fixes ``show_locals`` during collection.
- :gh:`101` allows to visualize the project's DAG.


0.0.14 - 2021-03-23
Expand Down
57 changes: 57 additions & 0 deletions docs/tutorials/how_to_visualize_the_dag.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
How to visualize the DAG
========================

pytask offers two interfaces to visualize the :term:`DAG` of your project.


Command line interface
----------------------

You can quickly create a visualization from the command line by entering

.. code-block:: console

$ pytask dag

at the top of your project which will generate a ``dag.pdf``.

There are ways to customize the visualization.

1. You can change the layout of the graph by using the ``-l/--layout`` option. By
default, it is set to ``dot`` and produces a hierarchical layout. graphviz supports
other layouts as well which are listed `here <https://graphviz.org/#roadmap>`_.

2. Using the ``-o/--output-path`` option, you can provide a file name for the graph. The
file extension changes the output format if it is supported by `pydot
<https://github.com/pydot/pydot>`_.


Programmatic Interface
----------------------

Since the possibilities for customization are limited via the command line interface,
there also exists a programmatic and interactive interface.

Similar to :func:`pytask.main`, there exists :func:`pytask.build_dag` which returns the
DAG as a :class:`networkx.DiGraph`.

.. code-block:: python

@pytask.mark.produces(BLD / "dag.svg")
def task_draw_dag(produces):
dag = pytask.build_dag({"paths": SRC})

Customization works best on the :class:`networkx.DiGraph`. For example, here we set the
shape of all nodes to hexagons by adding the property to the node attributes.

.. code-block:: python

nx.set_node_attributes(dag, "hexagon", "shape")

For drawing, you better switch to pydot or pygraphviz since the matplotlib backend
handles shapes with texts poorly. Here we use pydot and store the graph as an ``.svg``.

.. code-block:: python

graph = nx.nx_pydot.to_pydot(dag)
graph.write_svg(produces)
1 change: 1 addition & 0 deletions docs/tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ project. Start here if you are a new user.
how_to_capture
how_to_invoke_pytask
how_to_use_plugins
how_to_visualize_the_dag
6 changes: 3 additions & 3 deletions src/_pytask/build.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
"""Implement the build command."""
import sys
import traceback

import click
from _pytask.config import hookimpl
from _pytask.console import console
from _pytask.enums import ExitCode
from _pytask.exceptions import CollectionError
from _pytask.exceptions import ConfigurationError
Expand Down Expand Up @@ -50,7 +50,7 @@ def main(config_from_cli):
session = Session.from_config(config)

except (ConfigurationError, Exception):
traceback.print_exception(*sys.exc_info())
console.print_exception()
session = Session({}, None)
session.exit_code = ExitCode.CONFIGURATION_FAILED

Expand All @@ -71,7 +71,7 @@ def main(config_from_cli):
session.exit_code = ExitCode.FAILED

except Exception:
traceback.print_exception(*sys.exc_info())
console.print_exception()
session.exit_code = ExitCode.FAILED

return session
Expand Down
2 changes: 2 additions & 0 deletions src/_pytask/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ def pytask_add_hooks(pm):
from _pytask import database
from _pytask import debugging
from _pytask import execute
from _pytask import graph
from _pytask import logging
from _pytask import mark
from _pytask import parameters
Expand All @@ -64,6 +65,7 @@ def pytask_add_hooks(pm):
pm.register(database)
pm.register(debugging)
pm.register(execute)
pm.register(graph)
pm.register(logging)
pm.register(mark)
pm.register(parameters)
Expand Down
204 changes: 204 additions & 0 deletions src/_pytask/graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
"""This file contains the command and code for drawing the DAG."""
import shutil
from pathlib import Path
from typing import Any
from typing import Dict

import click
import networkx as nx
from _pytask.config import hookimpl
from _pytask.console import console
from _pytask.dag import descending_tasks
from _pytask.enums import ColorCode
from _pytask.enums import ExitCode
from _pytask.exceptions import CollectionError
from _pytask.exceptions import ConfigurationError
from _pytask.exceptions import ResolvingDependenciesError
from _pytask.nodes import reduce_names_of_multiple_nodes
from _pytask.pluginmanager import get_plugin_manager
from _pytask.session import Session
from _pytask.shared import get_first_non_none_value


@hookimpl(tryfirst=True)
def pytask_extend_command_line_interface(cli: click.Group):
"""Extend the command line interface."""
cli.add_command(dag)


@hookimpl
def pytask_parse_config(config, config_from_cli, config_from_file):
"""Parse configuration."""
config["output_path"] = get_first_non_none_value(
config_from_cli,
config_from_file,
key="output_path",
default=Path.cwd() / "dag.pdf",
callback=lambda x: None if x is None else Path(x),
)
config["layout"] = get_first_non_none_value(
config_from_cli,
config_from_file,
key="layout",
default="dot",
)


_HELP_TEXT_LAYOUT = (
"The layout determines the structure of the graph. Here you find an overview of "
"all available layouts: https://graphviz.org/#roadmap."
)


_HELP_TEXT_OUTPUT = (
"The output path of the visualization. The format is inferred from the file "
"extension."
)


@click.command()
@click.option("-l", "--layout", type=str, default=None, help=_HELP_TEXT_LAYOUT)
@click.option("-o", "--output-path", type=str, default=None, help=_HELP_TEXT_OUTPUT)
def dag(**config_from_cli):
"""Create a visualization of the project's DAG."""
session = _create_session(config_from_cli)
dag = _refine_dag(session)
_write_graph(dag, session.config["output_path"], session.config["layout"])


def build_dag(config_from_cli: Dict[str, Any]) -> "pydot.Dot": # noqa: F821
"""Build the DAG.

This function is the programmatic interface to ``pytask dag`` and returns a
preprocessed :class:`pydot.Dot` which makes plotting easier than with matplotlib.

To change the style of the graph, it might be easier to convert the graph back to
networkx, set attributes, and convert back to pydot or pygraphviz.

Parameters
----------
config_from_cli : Dict[str, Any]
The configuration usually received from the CLI. For example, use ``{"paths":
"example-directory/"}`` to collect tasks from a directory.

Returns
-------
pydot.Dot
A preprocessed graph which can be customized and exported.

"""
session = _create_session(config_from_cli)
dag = _refine_dag(session)
return dag


def _refine_dag(session):
dag = _shorten_node_labels(session.dag, session.config["paths"])
dag = _add_root_node(dag)
dag = _clean_dag(dag)
dag = _style_dag(dag)
dag = _escape_node_names_with_colons(dag)

return dag


def _create_session(config_from_cli: Dict[str, Any]) -> nx.DiGraph:
try:
pm = get_plugin_manager()
from _pytask import cli

pm.register(cli)
pm.hook.pytask_add_hooks(pm=pm)

config = pm.hook.pytask_configure(pm=pm, config_from_cli=config_from_cli)

session = Session.from_config(config)

except (ConfigurationError, Exception):
console.print_exception()
session = Session({}, None)
session.exit_code = ExitCode.CONFIGURATION_FAILED

else:
try:
session.hook.pytask_log_session_header(session=session)
session.hook.pytask_collect(session=session)
session.hook.pytask_resolve_dependencies(session=session)

except CollectionError:
session.exit_code = ExitCode.COLLECTION_FAILED

except ResolvingDependenciesError:
session.exit_code = ExitCode.RESOLVING_DEPENDENCIES_FAILED

except Exception:
session.exit_code = ExitCode.FAILED
console.print_exception()
console.rule(style=ColorCode.FAILED)

return session


def _shorten_node_labels(dag, paths):
node_names = dag.nodes
short_names = reduce_names_of_multiple_nodes(node_names, dag, paths)
old_to_new = dict(zip(node_names, short_names))
dag = nx.relabel_nodes(dag, old_to_new)
return dag


def _add_root_node(dag):
tasks_without_predecessor = [
name
for name in dag.nodes
if len(list(descending_tasks(name, dag))) == 0 and "task" in dag.nodes[name]
]
if tasks_without_predecessor:
dag.add_node("root")
for name in tasks_without_predecessor:
dag.add_edge("root", name)

return dag


def _clean_dag(dag):
"""Clean the DAG."""
for node in dag.nodes:
dag.nodes[node].clear()
return dag


def _style_dag(dag: nx.DiGraph) -> nx.DiGraph:
shapes = {name: "hexagon" if "::task_" in name else "box" for name in dag.nodes}
nx.set_node_attributes(dag, shapes, "shape")
return dag


def _escape_node_names_with_colons(dag: nx.DiGraph):
"""Escape node names with colons.

pydot cannot handle colons in node names since it messes up some syntax. Escaping
works by wrapping the string in double quotes. See this issue for more information:
https://github.com/pydot/pydot/issues/224.

"""
return nx.relabel_nodes(dag, {name: f'"{name}"' for name in dag.nodes})


def _write_graph(dag: nx.DiGraph, path: Path, layout: str) -> None:
try:
import pydot # noqa: F401
except ImportError:
raise ImportError(
"To visualize the project's DAG you need to install pydot which is "
"available with pip and conda."
) from None
if shutil.which(layout) is None:
raise RuntimeError(
"The layout program '{layout}' could not be found on your PATH. Please, "
"install graphviz. It is, for example, available with conda."
)

path.parent.mkdir(exist_ok=True, parents=True)
graph = nx.nx_pydot.to_pydot(dag)
graph.write(path, prog=layout, format=path.suffix[1:])
2 changes: 1 addition & 1 deletion src/_pytask/parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,5 @@ def pytask_extend_command_line_interface(cli):
cli.commands[command].params.append(_CONFIG_OPTION)
for command in ["build", "clean", "collect", "profile"]:
cli.commands[command].params.append(_IGNORE_OPTION)
for command in ["build", "clean", "collect", "profile"]:
for command in ["build", "clean", "collect", "dag", "profile"]:
cli.commands[command].params.append(_PATH_ARGUMENT)
3 changes: 2 additions & 1 deletion src/pytask/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
from _pytask.build import main
from _pytask.cli import cli
from _pytask.config import hookimpl
from _pytask.graph import build_dag
from _pytask.mark import MARK_GEN as mark # noqa: N811


__all__ = ["__version__", "cli", "hookimpl", "main", "mark"]
__all__ = ["__version__", "build_dag", "cli", "hookimpl", "main", "mark"]
Loading