-
Notifications
You must be signed in to change notification settings - Fork 4
[compliance tests] initial server scenarios #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pcarleton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of comments, but they're non-blocking / thinking through where we go with this. I'd say let's merge this all in and iterate forward.
A couple things this prompted me to that I want to explore are:
- how to test a scenario, specifically making sure it fails when it should. I think we'll want to make this super easy to do
- what negative tests we might want for a server (i.e. server shouldn't do this)
- starting to compile the big "MUST" / "SHOULD" list and start burning down our coverage
|
|
||
| return { | ||
| client, | ||
| close: async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not call close directly?
| try { | ||
| const connection = await connectToServer(serverUrl); | ||
|
|
||
| // The connection process already does initialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an example of where if we use the SDK, we lose some visibility. this one seems fine as long as the error the SDK throws is easy to interpret in the check output, but just flagging the pattern where we may want to break out of the sdk
| } | ||
| } | ||
|
|
||
| export class PromptsGetSimpleScenario implements ClientScenario { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still getting a feel for the cost of an additional scenario, but I'd be inclined to do all these in 1 vs. separate scenarios for each.
| @@ -0,0 +1,874 @@ | |||
| # MCP Server Conformance Requirements | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining this being a "suite" that is called something like "Basic", and it's a list of the tests to run, where each test describes the functionality it needs.
I like that this file describes the current SDK goal in a "1-stop" way though, so let's keep this, and we can explore "suites" or profiles later. Like for SDK's, we want this full set of features, but for servers that don't want some, we can have suites that don't fail on missing optional features.
| import { ClientScenario, ConformanceCheck } from '../../types.js'; | ||
| import { connectToServer } from './client-helper.js'; | ||
|
|
||
| export class ResourcesListScenario implements ClientScenario { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar here, my current take is these would be better as 1 scenario w/ multiple checks
| Progress | ||
| } from '@modelcontextprotocol/sdk/types.js'; | ||
|
|
||
| export class ToolsListScenario implements ClientScenario { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these ones seem good as separate, but I might group it like:
- list + call simple
- call content types (image, audio, mixed, resource)
- (rest as individual)
the rough heuristic I'm building up is "would it make sense for someone to implement a server that just does this scenario?" (open to iterating on that heuristic, I think we'll want it to be easy for folks to reason through when writing scenarios and using the scenarios). Right now, we're proposing folks have an everything-server that does all the things, but I don't want it to be a requirement that it's all 1 server since some scenarios will be mutually exclusive (i.e. an SDK should be able to support writing a server that doesn't support certain stateful features if that unlocks other transport things).
Some of this might also be fine to "start out everything super small, and then group things up later based on what's annoying or not as we use it more".
| console.log(`\n=== Running scenario: ${scenarioName} ===`); | ||
| try { | ||
| const result = await runServerConformanceTest(serverUrl, scenarioName); | ||
| allResults.push({ scenario: scenarioName, checks: result.checks }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hypothesis here is we'll want results logged as we go, and summary at the end for faster feedback loops
7635a76 to
7b3666a
Compare
This is a very simplified conformance tests suit with: