-
Notifications
You must be signed in to change notification settings - Fork 9.1k
Optional and multi-segment path parameters #2653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We have a few use cases when it comes to GitHub API v3. The current workaround is to annotate multi segment parameters with an extension, but this is far from ideal as far as tooling support and specification adherence goes.
|
The URI Template I know @awwright has written a URI matcher for URI Template (https://github.com/awwright/uri-template-router). He might have some insight on how to deal with the ambiguities. |
@jdesrosiers The Yeah, I've travelled that same path writing a URI matcher for URL Templates https://github.com/tavis-software/Tavis.UriTemplates/blob/master/src/UriTemplates/UriTemplate.cs#L355 It's not fun. |
uri-template-router tries to invent a general algorithm for reversing a URI Template and it is, indeed, somewhat complicated, in large part because URI Template isn't designed for matching. The first rule I try to follow is a simple, self-evident principle: If X is a strict subset of Y, then match X before Y. This means that
Then for picking between disjoint URI templates, prefer templates that are strict subsets when only looking at the first characters. Therefore, Finally, I match empty strings by default. Because URI Template isn't a matching syntax (like regexp), it doesn't have a way for requiring a minimum number of matches. Developers requiring a match are expected to validate the values for the matches, and call I recall there being one or two small bugs with the algorithm that I have, I was in the process of mathematically trying to prove that an algorithm implementing all these requirements is computationally possible, and then got sidetracked. |
I dove into a "parsing strings" rabbit hole and I've come up with some interesting findings: A URI Template is a subclass of a regular expression, which in turn can be expressed as a There is an optimal, normalized form for DFA. Once you build a DFA from a URI template, you can cache it/save it. You can reverse the DFA back into a regular expression or URI Template. Any URI that you can build with a URI Template can be reversed into one or more values for that URI template that will build that URI, and this can be done deterministically (so that there is a "canon" value). It is also mathematically possible to detect these ambiguous cases, where a single URI is potentially parsed into multiple values. The algorithms for building the DFA are complicated (at least 1k LoC), but the machine is computationally straightforward: Once you've built the state machine, it can be parsed in O(n) time, n = length of input string, no backtracking. You can even add complicated processing on top of the URI Template. For example, you could register two URI templates:
... and the DFA compiler could detect that (1) is a subset of (2), because of the regex constraints, even though the URI Templates by themselves would suggest the opposite. In a world where Godel's incompleteness theorem usually rules, and O(n!) algorithms are often our best case, this is pretty exciting. The question is, what can OpenAPI take from this? My opinion is that frameworks exist to remove implementation burden from developers. But writing all of this into OpenAPI (e.g. the regex constraints on URI Template variables) could potentially be prohibitive to complete implementations being written at all. I'm going to continue to work on this; but what sort of work would you find useful? |
All of APIs that Google provides via protobuf/grpc they also specify a google.api.http option that provides a HTTP 1.0 endpoint. They often use the syntax: For development teams that choose to support both gRPC and RESTful endpoints, this means they can't follow the guidelines Google provides around API design which saves small teams a huge amount of documentation and maintenance plus gives their APIs a consistent usage. |
Hello I would like to offer a real world example of the need for this. I have a legacy API that uses optional path params for paging information. So URLs like: I'm trying to put a new API gateway in front of this API and the gateway strictly requires all APIs to be done in OpenAPISpec, because I cannot model the optional path param (that I can see), the Gateway is returning 404 for any requests containing it. Unfortunately this is not something I can change, the API has thousands of consumers and serves 40 billion requests a year. Switching to all query params or headers or something else would be a major breaking change. my only other option would be to write some proxy in front of the Gateway to rewrite the URLs but that seems silly. So it would be great if the Spec could support this. |
I propose to add support of |
wouldn't |
No, not really. It's ambiguous in the sense that there's multiple values for a and b that can produce the same URI. However, because these forms all map to the same URI, they must identify equivalent things, so there's no ambiguity in practice. There's only one match that will be made by a finite state machine; the one such that |
Isn't that arbitrary? Might as well be parsed such that |
In our case we have multiple params per path but they always come at the end: |
Those are "matrix" parameters, not "path" parameters. |
@yinzara my bad, you are correct, are Matrix params supported by OpenAPISpec? Because I have not really been able to get them to work when they are Optional, which is what brought me here. I assume this would be a different issue? |
It's not arbitrary in the sense that someone choose it to work that way, from among other options; that's just how finite state machines work.
Sort of, the Besides that, yes, for any URI Template where there's multiple different values that map to the same URI, there's a normalized form that will match only the "first" of those equivalent values (leaving the "redundant" values to not match at all). It might not be expressible as a URI Template, but you can notate it as a regular expression. |
@ryber > my only other option would be to write some proxy in front of the Gateway to rewrite the URLs but that seems silly. That doesn't seem silly at all to me. It's the right thing to do when marking an endpoint as deprecated while still supporting it. If you can rewrite the old form to the new form, then your application only needs to support the new form, and it also lets you gate which clients are still allowed to use the old form (you can use the User-Agent or an authorization token to block new clients from using the old form). |
I am very much interested in this, because I would like to be able to document APIs that already exist, even if they don't conform to common designs. As such, I think it would be useful to look at the routing patterns actually supported by popular implementations, rather than trying to define a new system, or adapt an RFC intended for a different purpose. In PHP, the most popular router is probably nikic/fastroute which allows aribtrary optional text (not just optional parameters), but in limited positions, presumably to limit the the ambiguity:
In node.js the ExpressJS Routing component uses a "path to regexp" library, which allows optional parts to appear anywhere in the URL. Ambiguity is resolved by matching patterns in the order they're defined (it's also possible for a handler to call In ASP.net Core, the Routing system features "URL templates" supporting optional parameters with
Another way to look at it is that optional components or parameters are (at least in most cases) just a short-hand for something that can already be expressed by defining multiple routes with the same properties. Note the FastRoute example can be expressed as two separate routes; and the ASP.net example is ambiguous even though it uses two separate patterns, not an optional segment. It seems to me that for most use cases, it would be sufficient to have two algorithms specified:
|
Random dev who bumped into this challenge here... Maybe it would be helpful to pick a good, intuitive use case that there is a need for: Expressing a file system path. So, https://example.com/files/a/path/to/a/file If we express our path like /files/{directories+}/{filename} then there will be no ambiguity. There is clearly one or more directory and they should be deserialized into a list. The file is at the end. But if, in a slightly conceived example, we have packs of dogs and clowders of cats, how can this be expressed in a path? /fido/buster/kitty/bella/luna We'll find that this can't be done. A path definition like /{+dogs}/{+cats}/ doesn't work because there is no knowing where one list starts and the other ends. I think this is the ambiguity problem discussed and why having two multi-path segments next to each other won't work. This path definition, however, can be deserialized nicely. /dogs/{+dogs}/cats/{+cats} Well, unless there is a jokester that names their cat "cats"... |
@jpsalvesen This actually has an answer, see my comment at #2653 (comment) In regular expressions, selecting between behaviors for how to handle this is called "greedy" and "non-greedy". |
The point I'm trying to make is that being able to write a valid regular expression does not automatically mean it fishes out the data you want. Syntactically, http://example.com/{+a}/{+b}/ is valid. But semantically, it's still ambiguous. The more concrete example http://example.com/{+dogs}/{+cats}/ explains why and how. Just give this some thought, and you'll find out why this problem is controversial when building a syntax for describing data. |
Yeah, full control over regexp group definition must be exposed, not just greedy "any char". |
@jpsalvesen I get what you're saying, but there's multiple concepts to distinguish: A URI template maps a variable binding to a URI. When you write a URI like Further, among these alternate variable bindings, only one is canonical. In this example, the first match ( So, the URI Template may be surjective, but it is not ambiguous; there is an injective inverse. |
:( |
As others above have stated, it should be a greedy state machine. The pattern of If you want a double wildcard in this case, you have to specify a deterministic split point other than just a
To use the Cats vs Dogs example, you would do At this point the majority of API Server software supports varieties of wildcard APIs in this manner, so it seems prudent for OpenApi spec to support this, as it's already fully functional on the likes of Python, Js, and C# and has been for awhile. |
The Moonwalk (OAS 4) proposal currently includes full support for RFC 6570 URI Templates. Further discussion should happen in that repository, as the change is too large to go in the 3.x line. |
This is a meta issue representing a number of requests over the years including #1459 #1840 #892
Currently the OpenAPI specification does not allow optional path parameters nor path parameter values that allow characters such as the
/
. This means that if you had the following API description:then you would not be allowed to have myPath = "a/path/to/a/file" unless you escaped the forward slash character.
Also, the RFC6570 URI Template
/myreports/{reportName}{/nonDefaultFormat}
where the last path segment is optional is not supported by OAS.The primary reason for not supporting these types of APIs is that it creates the potential of an ambiguous match between a URL and the corresponding path item. Some tooling depends on being able to identify the API description for a specific URL. If multiple pathItems match, then some kind of alternate selection algorithm must be defined.
There have been a number of suggestions on how that selection algorithm might work, with varying levels of complexity. The open question is if there is enough community demand to justify the work necessary to find an acceptable solution.
If you have real-world scenarios where adding support for either multi-segment path parameters or optional path segments would make your life easier, please share them in this issue.
The text was updated successfully, but these errors were encountered: