-
-
Notifications
You must be signed in to change notification settings - Fork 29
Description
We should use consistent permalinks in URIs across our RDF to identify a workflow or a workflow file.
Currently (v1.1) we have:
- SPARQL uses Graph URIs like
http://sparql:3030/cwlviewer/github.com/genome/cancer-genomics-workflow/blob/be7e682c6a2d0b24b949e022aeae7786bd8434ed/strelka/workflow.cwlthat exposes the origin of the git repository, its commit and file path -
- Statements within such graphs contains URIs like
file:///data/git/1a2b5d62cde8555e5932907b28189585a2bf99d2/fp_filter/workflow.cwlthat exposes the working directory for the git clone.
- Statements within such graphs contains URIs like
- The research object's
.ro/annotations/workflow.ttlannotation contain URIs likehttps://raw.githubusercontent.com/common-workflow-language/workflows/master/workflows/make-to-cwl/dna.cwl#main
I propose we replace all of those (possibly with search-replace on the cwltool --printrdf output) to use a single location-free URI like: https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl
Permalink URI scheme
The new URI scheme is composed like this:
https://w3id.org/cwl/view/{scm}/{commit}/{path}#{anchor}
https://w3id.org/cwl/view/fixed prefix at permalink service https://w3id.org/ (/cwlis our namespace){scm}- source code management protocol, currently onlygitsupported{commit}- full git commit sha1 id (no branches or short commits allowed){path}- relative path to.cwlfile within a checkout of that git commit#{anchor}- an optional anchor, e.g.#mainas-is fromcwltool --print-rdf; not passed on to server
Anyone can construct a URI according to the above scheme for a given git commit and file - even if the commit only exists on a local disk or in a private git repository that the CWL Viewer does not know about.
These make good Linked Data identifiers for specific CWL workflow definitions because:
- The
cwlfile and its neighbors can't change within the git commit - The URI is the same wherever the git repository is pushed or hosted
Anyone generating the URIs should be aware of some edge cases:
- An uncommitted file change
- CWL file is within a git submodule which could be a movable branch (without any commits appearing on master git repository)
- CWL file is not tracked in git repository (e.g.
../../outside.cwl)
Resolving
Resolving any URI starting with https://w3id.org/cwl/view/git/{rest} will HTTP 302 redirect to the corresponding resource https://view.commonwl.org/git/{rest} representing that path in that commit
GET https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
HTTP/1.1 302 Found
Location: https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwlUnknown commit?
If the public CWL viewer have never heard about the commit 933bf2a1a1cce32d88f88f136275535da9df0954 there is not much more to say:
HEAD https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
HTTP/1.1 404 Not Found
Unknown git commit `933bf2a1a1cce32d88f88f136275535da9df0954`Content-negotiation
But if it is known, CWL Viewer finds a matching graph for that file in that commit, then the client can content-negotiate to get various RDF serializations like text/turtle or application/ld+json:
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: text/turtle
HTTP/1.1 200 OK
Vary: Accept
Content-Type: text/turtle
@prefix cwl: <https://w3id.org/cwl/cwl#>.
<https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl#main> a cwl:Workflow .
# ....Notice how the returned RDF uses the location-independent w3id.org namespace, not view.commonwl.org
YAML
If the client asks for the CWL file with type application/x-yaml or application/octet-stream, and the git repository has a public "raw" option, then the server can redirect to that:
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/x-yaml
HTTP/1.1 302 Found
Vary: Accept
Location: https://cdn.rawgit.com/common-workflow-language/workflows/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl
GET https://cdn.rawgit.com/common-workflow-language/workflows/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/x-yaml
HTTP/1.1 200 OK
Content-Type: application/octet-stream
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
inputs:
...HTML and JSON API
If the user asks for text/html, it is probably a browser. So CWL Viewer will redirect to the normal workflow rendering:
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: text/html
HTTP/1.1 302 Found
Vary: Accept
Location: https://view.commonwl.org/workflows/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwlThis works also for application/json which then gives the JSON api output:
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/json
HTTP/1.1 302 Found
Vary: Accept
Location: https://view.commonwl.org/workflows/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwlGET https://view.commonwl.org/workflows/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/json
HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json{
"retrievedFrom": {
"owner": "common-workflow-language",
"repoName": "workflows",
"branch": "master",
"path": "workflows/lobSTR/lobSTR-workflow.cwl",
"url": "https://github.com/common-workflow-language/workflows/tree/master/workflows/lobSTR/lobSTR-workflow.cwl"
},
"retrievedOn": 1499175275743,
"lastCommit": "920c6be45f08e979e715a0018f22c532b024074f",
"label": "lobSTR-workflow.cwl",
...
}Images
OK, let's be cool and do images as well.
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: image/svg+xml
HTTP/1.1 302 Found
Vary: Accept
Location: https://view.commonwl.org/graph/svg/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwlResearch Object Bundle
..and of course our Research Object Bundle if client asks for application/ro+zip or application/zip
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/ro+zip
HTTP/1.1 302 Found
Vary: Accept
Location: https://view.commonwl.org/robundle/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwlPacked workflows
If there's a packed CWL file with nested workflows, then a workflow is not matchable by it's filename alone, as you need to know also the #{anchor}. This is not a problem for the RDF output, as it will contain all workflows found in the packed CWL file, and you just match by #anchor.
However it can be a problem for the HTTP and JSON rendering, which with #103 would have alternative URIs depending on the selected nested workflow. So it could be confusing to redirect to the top-level workflow (if that can even be determined) as the user won't find their `#nested1/step/nestedstep2# in there; we don't expand nested workflows in the UI.
So if the user asks for text/html or application/json for a packed workflow (multipe workflows found), then we'll give an error, with links to the candidates using #103 escaped URIs.
GET https://view.commonwl.org/git/adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/packed.cwl HTTP/1.1
Accept: text/html
HTTP/1.1 300 Multiple Choices
Vary: Accept
Content-Type: text/uri-list
https://view.commonwl.org/workflows/example.com/blob/adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/packed.cwl%23main
https://view.commonwl.org/workflows/example.com/blob/adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/packed.cwl%23nested1
https://view.commonwl.org/workflows/example.com/blob/adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/packed.cwl%23nested2