Skip to content

Fix pydot tests #2019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

lkk7
Copy link

@lkk7 lkk7 commented Jul 16, 2024

No description provided.

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.92%. Comparing base (6d8c2a4) to head (5dd4d08).

❗ There is a different number of reports uploaded between BASE (6d8c2a4) and HEAD (5dd4d08). Click for more details.

HEAD has 5 uploads less than BASE
Flag BASE (6d8c2a4) HEAD (5dd4d08)
16 11
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2019       +/-   ##
===========================================
- Coverage   83.78%   54.92%   -28.86%     
===========================================
  Files          46       46               
  Lines        8262     8260        -2     
  Branches     2199     2103       -96     
===========================================
- Hits         6922     4537     -2385     
- Misses        858     3185     +2327     
- Partials      482      538       +56     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lkk7
Copy link
Author

lkk7 commented Jul 17, 2024

The test fails for the same reason it failed for me locally: https://github.com/common-workflow-language/cwltool/actions/runs/9964048376/job/27541027817

The reason is that cwltool returns these strings with a path attached to them, while it wasn't expected.
It doesn't seem to be pydot related, does it?

@mr-c
Copy link
Member

mr-c commented Jul 17, 2024

@lkk7 Interesting, I've triggered a re-run of the CI on the main branch to see if there has been a regression related to one of our dependencies since July 12th

https://github.com/common-workflow-language/cwltool/actions/runs/9905754128

@mr-c
Copy link
Member

mr-c commented Jul 17, 2024

@lkk7 When we pin to a pydot before version 3, all the CI tests still pass: https://github.com/common-workflow-language/cwltool/actions/runs/9976419287/job/27568552139

@lkk7
Copy link
Author

lkk7 commented Aug 19, 2024

That's an old issue, but could we retry the CI now? Or has this been solved already?

@mr-c mr-c force-pushed the fix-tests-pydot branch from c5ce229 to 80e64f9 Compare August 20, 2024 07:32
@mr-c
Copy link
Member

mr-c commented Aug 20, 2024

@lkk7 Since you started this PR we capped the pydot version to be less than 3. So now that we've rebased the PR I had to add a revert of that pydot version cap commit. We can now compare the pending test results (with pydot 3.x) to the results without pydot 3.x.

@mr-c
Copy link
Member

mr-c commented Aug 20, 2024

@lkk7 Looks like there is a behavior change with pydot 3.x (3.0.1, specifically)

    [
        (
  -         '"command_line_tool"',
  -         '"file_output"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#file_output"',
        ),
        (
  -         '"file_input"',
  -         '"nested_workflow"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#file_input"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#nested_workflow"',
        ),
        (
  -         '"nested_workflow"',
  -         '"operation"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#nested_workflow"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#operation"',
        ),
        (
  -         '"operation"',
  -         '"command_line_tool"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#operation"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool"',
        ),
        (
  -         '"operation"',
  -         '"string_output"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#operation"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#string_output"',
        ),
    ]

https://github.com/common-workflow-language/cwltool/actions/runs/10468140236/job/28988377402?pr=2019#step:8:82

@lkk7
Copy link
Author

lkk7 commented Aug 20, 2024

I can't pinpoint the exact fix yet, but the bug appears in cwlrdf.py in this function:

def printdot(
    wf: Process,
    ctx: ContextType,
    stdout: IO[str],
) -> None:
    cwl_viewer: CWLViewer = CWLViewer(printrdf(wf, ctx, "n3"))
    stdout.write(cwl_viewer.dot().replace(f"{wf.metadata['id']}#", ""))  ###  <<< HERE 

It's all about those file paths.

wf.metadata['id'] equals 'file:///<your_path>/three_step_color.cwl', and in pydot 2.0.0 it's removed from the string as expected.

In 3.0.0+ the cwl_viewer.dot() contains these paths in a changed form: file:"///<your_path>/three_step_color.cwl#string_output". There is an additional quote that causes the replacement not to work.

@ferdnyc, Could you confirm my suspicion. I know you were not involved in this one, but you made the change that affect this (ea0131517db2a0dbf58ec0f8bb3bd7b40bbd279d). The context of this bug is not important, but – file:///<path>#<...> changed to file:"///<path>#<...>". Is it because now pydot thinks it's a port, like node1:port1?

@mr-c
Copy link
Member

mr-c commented Jun 4, 2025

ping @lkk7 @@ferdnyc , thank you

@lkk7
Copy link
Author

lkk7 commented Jun 7, 2025

After a year, I've looked at this again, and I think again that this is because of the handling of port syntax, (the : sign which is used in the failing test).

The thing happens here in cwltool:

def printdot(
    wf: Process,
    ctx: ContextType,
    stdout: IO[str],
) -> None:
    cwl_viewer: CWLViewer = CWLViewer(printrdf(wf, ctx, "n3"))
    stdout.write(cwl_viewer.dot().replace(f"{wf.metadata['id']}#", ""))

As we see in the failed test log posted above:

        (
  -         '"command_line_tool"',
  -         '"file_output"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#file_output"',
        ),
...

The problem occurs specifically here:

stdout.write(cwl_viewer.dot().replace(f"{wf.metadata['id']}#", ""))`

The replacement isnt working anymore, because it wants to erase file:///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool, but it encounters file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool".

I'm not sure if we can just revert back to the previous behavior which ignored it, because theoretically it's unsafe. But it's possible that the quoting behavior is too tight and I'll mention it in the upcoming PR.

An ugly alternative is to adjust the .replace() call, here in cwltool, to just work around the problem.

@ferdnyc
Copy link

ferdnyc commented Jun 9, 2025

Apologies, just seeing this for the first time. The port conjecture is likely; the way to overcome that would be to explicitly quote the ID string. (So, pydot.Node('"file:///<path>#command_line_tool"') rather than pydot.Node("file:///<path>#command_line_tool").) A quoted string shouldn't be parsed apart into an id:port value, because the correct quoting there would be '"id":"port"'.

That's consistent with graphviz's own parser, since:

graph G {
  # Either of these will break (syntax error),
  # since it can't parse slashes in an unquoted port value
  file:///some/path#fragment [label="\N"];
  file:///some/path [label="\N"];
}
graph G {
  # These are all exactly equivalent,
  # any will create a node named "file"
  file:somepath [label="\N"];
  "file":"somepath" [label="\N"];
  file:"///some/path#fragment" [label="\N"];
}
graph G {
  # This will create a node named "file:///some/path#fragment"
  "file:///some/path#fragment" [label="\N"];
}

@ferdnyc
Copy link

ferdnyc commented Jun 9, 2025

Without knowing more about the context, I can't say where the behavior changed, but if Pydot 2.x was processing input differently it was probably doing it wrong in ways that would break on valid syntax. Like node IDs with :port components, which even though they're completely superfluous (ports are only meaningful for edges, not nodes) are still valid according to the grammar. A lot of that stuff was cleaned up for 3.x.

@lkk7 lkk7 force-pushed the fix-tests-pydot branch from 5dd4d08 to 1ed32af Compare June 9, 2025 19:55
@lkk7
Copy link
Author

lkk7 commented Jun 9, 2025

@ferdnyc Thanks for the suggestion! Of course we can quote it. Why didn't I think of that?

@lkk7
Copy link
Author

lkk7 commented Jun 9, 2025

@mr-c

Could you approve the workflow? The latest commit should make the tests pass.

Disclaimer: This change only satisfies the tests, which aren't 100% exhaustive.
If you want the tested output to match the expected output exactly) (which it doesn't right now, as there are some small differences), then the code needs to be refactored. Maybe the expected graph is not up-to-date or something. But this is outside the scope of the current issue, as it has been this way for a long time.
And generally, it works. But I'm not that experienced in cwltool to check it in detail.

@ferdnyc
Copy link

ferdnyc commented Jun 9, 2025

@lkk7

Why didn't I think of that?

Because then I wouldn't get the chance to sound Clever™!

@ferdnyc
Copy link

ferdnyc commented Jun 9, 2025

Without knowing more about the context, I can't say where the behavior changed, but if Pydot 2.x was processing input differently it was probably doing it wrong in ways that would break on valid syntax. Like node IDs with :port components, which even though they're completely superfluous (ports are only meaningful for edges, not nodes) are still valid according to the grammar. A lot of that stuff was cleaned up for 3.x.

Actually, it looks like these are edges, since even Pydot 4.x won't actually output the port part of a Node definition (as I said, it's meaningless); but the difference is in how the Edge endpoints are output (not parsed).

Pydot does a lot of DWIMming. So if the user creates a technically-invalid edge endpoint like file:///some/path#fragment, instead of throwing a syntax error like Graphviz would, we'll just add the necessary quotes for you.

But whereas Pydot 2.x, 3.x, and 4.x would all output a simple set of endpoints like pydot.Edge("a:p1", "b:p2") the same way:

>>> import pydot
>>> pydot.__version__
'2.0.0'
>>> e = pydot.Edge("a:p1", "b:p2")
>>> e.to_string()
'a:p1 -- b:p2;'

The difference is in how they DWIM endpoints that do require quoting.

Pydot 2.x just threw quotes around the entire endpoint string, making it unnecessarily difficult to create endpoints with port values:

>>> import pydot
>>> pydot.__version__
'2.0.0'
>>> e = pydot.Edge("a:p2!", "b:p4!")
>>> e.to_string()
'"a:p2!" -- "b:p4!";'
>>> e = pydot.Edge("a?:p1", "b?:p2")
>>> e.to_string()
'"a?:p1" -- "b?:p2";'

Pydot 3.x+ is smarter, and knows how to properly quote an id:port-format endpoint:

>>> import pydot
>>> pydot.__version__
'4.0.1.dev0'
>>> e = pydot.Edge("a:p2!", "b:p4!")
>>> e.to_string()
'a:"p2!" -- b:"p4!";'
>>> e = pydot.Edge("a?:p2", "b?:p4")
>>> e.to_string() 
'"a?":p2 -- "b?":p4;'

...which does mean that now, when you don't want that colon to represent the id:port split, you need to wrap the entire ID in quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants