Skip to content

Commit a0cd7d9

Browse files
authored
Add new command pytask dag to visualize the DAG. (#101)
1 parent f17f499 commit a0cd7d9

File tree

11 files changed

+364
-10
lines changed

11 files changed

+364
-10
lines changed

docs/changes.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@ all releases are available on `PyPI <https://pypi.org/project/pytask>`_ and
2323
- :gh:`93` fixes the display of parametrized arguments in the console.
2424
- :gh:`94` adds ``--show-locals`` which allows to print local variables in tracebacks.
2525
- :gh:`96` implements a spinner to show the progress during the collection.
26-
- :gh:`99` enables color support for WSL in Windows Terminal and fixes ``show_locals``
27-
in ``collect.py``.
26+
- :gh:`99` enables color support in WSL and fixes ``show_locals`` during collection.
27+
- :gh:`101` allows to visualize the project's DAG.
2828

2929

3030
0.0.14 - 2021-03-23
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
How to visualize the DAG
2+
========================
3+
4+
pytask offers two interfaces to visualize the :term:`DAG` of your project.
5+
6+
7+
Command line interface
8+
----------------------
9+
10+
You can quickly create a visualization from the command line by entering
11+
12+
.. code-block:: console
13+
14+
$ pytask dag
15+
16+
at the top of your project which will generate a ``dag.pdf``.
17+
18+
There are ways to customize the visualization.
19+
20+
1. You can change the layout of the graph by using the ``-l/--layout`` option. By
21+
default, it is set to ``dot`` and produces a hierarchical layout. graphviz supports
22+
other layouts as well which are listed `here <https://graphviz.org/#roadmap>`_.
23+
24+
2. Using the ``-o/--output-path`` option, you can provide a file name for the graph. The
25+
file extension changes the output format if it is supported by `pydot
26+
<https://github.com/pydot/pydot>`_.
27+
28+
29+
Programmatic Interface
30+
----------------------
31+
32+
Since the possibilities for customization are limited via the command line interface,
33+
there also exists a programmatic and interactive interface.
34+
35+
Similar to :func:`pytask.main`, there exists :func:`pytask.build_dag` which returns the
36+
DAG as a :class:`networkx.DiGraph`.
37+
38+
.. code-block:: python
39+
40+
@pytask.mark.produces(BLD / "dag.svg")
41+
def task_draw_dag(produces):
42+
dag = pytask.build_dag({"paths": SRC})
43+
44+
Customization works best on the :class:`networkx.DiGraph`. For example, here we set the
45+
shape of all nodes to hexagons by adding the property to the node attributes.
46+
47+
.. code-block:: python
48+
49+
nx.set_node_attributes(dag, "hexagon", "shape")
50+
51+
For drawing, you better switch to pydot or pygraphviz since the matplotlib backend
52+
handles shapes with texts poorly. Here we use pydot and store the graph as an ``.svg``.
53+
54+
.. code-block:: python
55+
56+
graph = nx.nx_pydot.to_pydot(dag)
57+
graph.write_svg(produces)

docs/tutorials/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,4 @@ project. Start here if you are a new user.
2323
how_to_capture
2424
how_to_invoke_pytask
2525
how_to_use_plugins
26+
how_to_visualize_the_dag

src/_pytask/build.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
"""Implement the build command."""
22
import sys
3-
import traceback
43

54
import click
65
from _pytask.config import hookimpl
6+
from _pytask.console import console
77
from _pytask.enums import ExitCode
88
from _pytask.exceptions import CollectionError
99
from _pytask.exceptions import ConfigurationError
@@ -50,7 +50,7 @@ def main(config_from_cli):
5050
session = Session.from_config(config)
5151

5252
except (ConfigurationError, Exception):
53-
traceback.print_exception(*sys.exc_info())
53+
console.print_exception()
5454
session = Session({}, None)
5555
session.exit_code = ExitCode.CONFIGURATION_FAILED
5656

@@ -71,7 +71,7 @@ def main(config_from_cli):
7171
session.exit_code = ExitCode.FAILED
7272

7373
except Exception:
74-
traceback.print_exception(*sys.exc_info())
74+
console.print_exception()
7575
session.exit_code = ExitCode.FAILED
7676

7777
return session

src/_pytask/cli.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ def pytask_add_hooks(pm):
4646
from _pytask import database
4747
from _pytask import debugging
4848
from _pytask import execute
49+
from _pytask import graph
4950
from _pytask import logging
5051
from _pytask import mark
5152
from _pytask import parameters
@@ -64,6 +65,7 @@ def pytask_add_hooks(pm):
6465
pm.register(database)
6566
pm.register(debugging)
6667
pm.register(execute)
68+
pm.register(graph)
6769
pm.register(logging)
6870
pm.register(mark)
6971
pm.register(parameters)

src/_pytask/graph.py

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
"""This file contains the command and code for drawing the DAG."""
2+
import shutil
3+
from pathlib import Path
4+
from typing import Any
5+
from typing import Dict
6+
7+
import click
8+
import networkx as nx
9+
from _pytask.config import hookimpl
10+
from _pytask.console import console
11+
from _pytask.dag import descending_tasks
12+
from _pytask.enums import ColorCode
13+
from _pytask.enums import ExitCode
14+
from _pytask.exceptions import CollectionError
15+
from _pytask.exceptions import ConfigurationError
16+
from _pytask.exceptions import ResolvingDependenciesError
17+
from _pytask.nodes import reduce_names_of_multiple_nodes
18+
from _pytask.pluginmanager import get_plugin_manager
19+
from _pytask.session import Session
20+
from _pytask.shared import get_first_non_none_value
21+
22+
23+
@hookimpl(tryfirst=True)
24+
def pytask_extend_command_line_interface(cli: click.Group):
25+
"""Extend the command line interface."""
26+
cli.add_command(dag)
27+
28+
29+
@hookimpl
30+
def pytask_parse_config(config, config_from_cli, config_from_file):
31+
"""Parse configuration."""
32+
config["output_path"] = get_first_non_none_value(
33+
config_from_cli,
34+
config_from_file,
35+
key="output_path",
36+
default=Path.cwd() / "dag.pdf",
37+
callback=lambda x: None if x is None else Path(x),
38+
)
39+
config["layout"] = get_first_non_none_value(
40+
config_from_cli,
41+
config_from_file,
42+
key="layout",
43+
default="dot",
44+
)
45+
46+
47+
_HELP_TEXT_LAYOUT = (
48+
"The layout determines the structure of the graph. Here you find an overview of "
49+
"all available layouts: https://graphviz.org/#roadmap."
50+
)
51+
52+
53+
_HELP_TEXT_OUTPUT = (
54+
"The output path of the visualization. The format is inferred from the file "
55+
"extension."
56+
)
57+
58+
59+
@click.command()
60+
@click.option("-l", "--layout", type=str, default=None, help=_HELP_TEXT_LAYOUT)
61+
@click.option("-o", "--output-path", type=str, default=None, help=_HELP_TEXT_OUTPUT)
62+
def dag(**config_from_cli):
63+
"""Create a visualization of the project's DAG."""
64+
session = _create_session(config_from_cli)
65+
dag = _refine_dag(session)
66+
_write_graph(dag, session.config["output_path"], session.config["layout"])
67+
68+
69+
def build_dag(config_from_cli: Dict[str, Any]) -> "pydot.Dot": # noqa: F821
70+
"""Build the DAG.
71+
72+
This function is the programmatic interface to ``pytask dag`` and returns a
73+
preprocessed :class:`pydot.Dot` which makes plotting easier than with matplotlib.
74+
75+
To change the style of the graph, it might be easier to convert the graph back to
76+
networkx, set attributes, and convert back to pydot or pygraphviz.
77+
78+
Parameters
79+
----------
80+
config_from_cli : Dict[str, Any]
81+
The configuration usually received from the CLI. For example, use ``{"paths":
82+
"example-directory/"}`` to collect tasks from a directory.
83+
84+
Returns
85+
-------
86+
pydot.Dot
87+
A preprocessed graph which can be customized and exported.
88+
89+
"""
90+
session = _create_session(config_from_cli)
91+
dag = _refine_dag(session)
92+
return dag
93+
94+
95+
def _refine_dag(session):
96+
dag = _shorten_node_labels(session.dag, session.config["paths"])
97+
dag = _add_root_node(dag)
98+
dag = _clean_dag(dag)
99+
dag = _style_dag(dag)
100+
dag = _escape_node_names_with_colons(dag)
101+
102+
return dag
103+
104+
105+
def _create_session(config_from_cli: Dict[str, Any]) -> nx.DiGraph:
106+
try:
107+
pm = get_plugin_manager()
108+
from _pytask import cli
109+
110+
pm.register(cli)
111+
pm.hook.pytask_add_hooks(pm=pm)
112+
113+
config = pm.hook.pytask_configure(pm=pm, config_from_cli=config_from_cli)
114+
115+
session = Session.from_config(config)
116+
117+
except (ConfigurationError, Exception):
118+
console.print_exception()
119+
session = Session({}, None)
120+
session.exit_code = ExitCode.CONFIGURATION_FAILED
121+
122+
else:
123+
try:
124+
session.hook.pytask_log_session_header(session=session)
125+
session.hook.pytask_collect(session=session)
126+
session.hook.pytask_resolve_dependencies(session=session)
127+
128+
except CollectionError:
129+
session.exit_code = ExitCode.COLLECTION_FAILED
130+
131+
except ResolvingDependenciesError:
132+
session.exit_code = ExitCode.RESOLVING_DEPENDENCIES_FAILED
133+
134+
except Exception:
135+
session.exit_code = ExitCode.FAILED
136+
console.print_exception()
137+
console.rule(style=ColorCode.FAILED)
138+
139+
return session
140+
141+
142+
def _shorten_node_labels(dag, paths):
143+
node_names = dag.nodes
144+
short_names = reduce_names_of_multiple_nodes(node_names, dag, paths)
145+
old_to_new = dict(zip(node_names, short_names))
146+
dag = nx.relabel_nodes(dag, old_to_new)
147+
return dag
148+
149+
150+
def _add_root_node(dag):
151+
tasks_without_predecessor = [
152+
name
153+
for name in dag.nodes
154+
if len(list(descending_tasks(name, dag))) == 0 and "task" in dag.nodes[name]
155+
]
156+
if tasks_without_predecessor:
157+
dag.add_node("root")
158+
for name in tasks_without_predecessor:
159+
dag.add_edge("root", name)
160+
161+
return dag
162+
163+
164+
def _clean_dag(dag):
165+
"""Clean the DAG."""
166+
for node in dag.nodes:
167+
dag.nodes[node].clear()
168+
return dag
169+
170+
171+
def _style_dag(dag: nx.DiGraph) -> nx.DiGraph:
172+
shapes = {name: "hexagon" if "::task_" in name else "box" for name in dag.nodes}
173+
nx.set_node_attributes(dag, shapes, "shape")
174+
return dag
175+
176+
177+
def _escape_node_names_with_colons(dag: nx.DiGraph):
178+
"""Escape node names with colons.
179+
180+
pydot cannot handle colons in node names since it messes up some syntax. Escaping
181+
works by wrapping the string in double quotes. See this issue for more information:
182+
https://github.com/pydot/pydot/issues/224.
183+
184+
"""
185+
return nx.relabel_nodes(dag, {name: f'"{name}"' for name in dag.nodes})
186+
187+
188+
def _write_graph(dag: nx.DiGraph, path: Path, layout: str) -> None:
189+
try:
190+
import pydot # noqa: F401
191+
except ImportError:
192+
raise ImportError(
193+
"To visualize the project's DAG you need to install pydot which is "
194+
"available with pip and conda."
195+
) from None
196+
if shutil.which(layout) is None:
197+
raise RuntimeError(
198+
"The layout program '{layout}' could not be found on your PATH. Please, "
199+
"install graphviz. It is, for example, available with conda."
200+
)
201+
202+
path.parent.mkdir(exist_ok=True, parents=True)
203+
graph = nx.nx_pydot.to_pydot(dag)
204+
graph.write(path, prog=layout, format=path.suffix[1:])

src/_pytask/parameters.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,5 +35,5 @@ def pytask_extend_command_line_interface(cli):
3535
cli.commands[command].params.append(_CONFIG_OPTION)
3636
for command in ["build", "clean", "collect", "profile"]:
3737
cli.commands[command].params.append(_IGNORE_OPTION)
38-
for command in ["build", "clean", "collect", "profile"]:
38+
for command in ["build", "clean", "collect", "dag", "profile"]:
3939
cli.commands[command].params.append(_PATH_ARGUMENT)

src/pytask/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
from _pytask.build import main
33
from _pytask.cli import cli
44
from _pytask.config import hookimpl
5+
from _pytask.graph import build_dag
56
from _pytask.mark import MARK_GEN as mark # noqa: N811
67

78

8-
__all__ = ["__version__", "cli", "hookimpl", "main", "mark"]
9+
__all__ = ["__version__", "build_dag", "cli", "hookimpl", "main", "mark"]

0 commit comments

Comments
 (0)