Skip to content

Conversation

@friggeri
Copy link
Collaborator

No description provided.

@friggeri friggeri requested a review from a team as a code owner January 30, 2026 22:42
Copilot AI review requested due to automatic review settings January 30, 2026 22:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens the snapshot-based replay harness so that CI runs fail when tests issue new or changed requests without corresponding snapshots, and it updates tests/snapshots to align with that stricter behavior. It also adds stubs for new endpoints and tweaks specific E2E tests to avoid relying on live LLM behavior in CI.

Changes:

  • Add or update session and permissions snapshot YAMLs (including a new sendandwait_throws_on_timeout snapshot and updated tool arguments/assistant messages).
  • Enhance ReplayingCapiProxy to (a) provide a default model for /models, (b) stub agent memory endpoints, and (c) treat missing cached responses as CI errors instead of warnings.
  • Update the Node.js session E2E tests to skip the timeout-specific sendAndWait test in CI, since it intentionally times out before a replayable response is available.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/snapshots/session/sendandwait_throws_on_timeout.yaml New snapshot capturing the model and initial system/user messages for the sendAndWait-timeout scenario.
test/snapshots/session/send_returns_immediately_while_events_stream_in_background.yaml Adjusted tool-call arguments (intent string, shell description, mode: "sync") and slightly rephrased final assistant text to match current behavior.
test/snapshots/permissions/should_receive_toolcallid_in_permission_requests.yaml Minor assistant message wording change in the permissions snapshot to align with updated responses.
test/harness/replayingCapiProxy.ts Introduced a default model for /models, added stub handlers for /agents/.../memory/... endpoints, and changed the CI fallback behavior so missing snapshots produce an error via exitWithNoMatchingRequestError.
nodejs/test/e2e/session.test.ts Marked the sendAndWait throws on timeout test as it.skipIf(process.env.CI === "true") with clarifying comments, so CI doesn't depend on a replayable LLM response for this client-side timeout behavior.
Comments suppressed due to low confidence (1)

test/harness/replayingCapiProxy.ts:296

  • When process.env.CI === "true" and no cached response is found, exitWithNoMatchingRequestError calls options.onError, but the code still falls through to super.performRequest(options). This can cause the proxy to both send a 500 Proxy error response via onError and then attempt to proxy the request normally, leading to double-callbacks and potential write after end/header-sent errors. To avoid conflicting responses and ensure CI fails immediately when snapshots are missing, this branch should return early after exitWithNoMatchingRequestError instead of continuing to super.performRequest.
        // Fallback to normal proxying if no cached response found
        // This implicitly captures the new exchange too
        if (process.env.CI === "true") {
          await exitWithNoMatchingRequestError(
            options,
            state.testInfo,
            state.workDir,
            state.toolResultNormalizers,
          );
        }
        super.performRequest(options);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +213 to +232
// Handle memory endpoints - return stub responses in tests
// Matches: /agents/*/memory/*/enabled, /agents/*/memory/*/recent, etc.
if (options.requestOptions.path?.match(/\/agents\/.*\/memory\//)) {
let body: string;
if (options.requestOptions.path.includes("/enabled")) {
body = JSON.stringify({ enabled: false });
} else if (options.requestOptions.path.includes("/recent")) {
body = JSON.stringify({ memories: [] });
} else {
body = JSON.stringify({});
}
const headers = {
"content-type": "application/json",
...commonResponseHeaders,
};
options.onResponseStart(200, headers);
options.onData(Buffer.from(body));
options.onResponseEnd();
return;
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new memory endpoint stubs (/agents/.../memory/...) and the CI-only error path for missing cached responses (exitWithNoMatchingRequestError) do not appear to have dedicated tests in replayingCapiProxy.test.ts, even though this file has comprehensive coverage for other behaviors. Given that these branches change how the proxy behaves in CI and for new endpoints, consider adding tests that (1) exercise the memory endpoint handlers and (2) verify that in CI, a request without a matching snapshot produces the expected ::error annotation and fails the request instead of silently proxying through.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

SDK Consistency Review: Test Infrastructure Changes

This PR modifies the shared test infrastructure to make CI builds fail when snapshots aren't present, and conditionally skips the Node.js sendAndWait throws on timeout test in CI environments.

✅ Good: Shared Test Harness Improvements

The changes to test/harness/replayingCapiProxy.ts improve the reliability of all SDK tests by:

  • Providing a default model (claude-sonnet-4.5) when no stored data is available
  • Adding stub responses for memory endpoints to support tests without requiring actual API calls
  • Converting warnings to errors when cached responses are missing in CI, ensuring test reliability

These changes benefit all four SDKs (Node.js, Python, Go, .NET) since they all use this shared test harness.

⚠️ Inconsistency: Timeout Test Coverage

However, there's a cross-SDK inconsistency with the timeout test:

Current State:

  • Node.js ✅ Has test, NOW skipped in CI (this PR)
  • .NET ✅ Has test (SendAndWait_Throws_On_Timeout), but NOT skipped in CI
  • Python ❌ Missing equivalent test
  • Go ❌ Missing equivalent test

Recommendation:

Since this PR is skipping the Node.js timeout test in CI (because it validates client-side behavior and has no LLM snapshot), the .NET test should be updated similarly:

  1. Apply the same CI skip logic to .NET's SendAndWait_Throws_On_Timeout test to maintain consistency with Node.js
  2. Consider adding equivalent tests to Python and Go for feature parity, also with CI skips

The .NET test currently looks like this:

[Fact]
public async Task SendAndWait_Throws_On_Timeout()
{
    var session = await Client.CreateSessionAsync();
    var ex = await Assert.ThrowsAsync<TimeoutException>(() =>
        session.SendAndWaitAsync(new MessageOptions { Prompt = "Run 'sleep 2 && echo done'" }, TimeSpan.FromMilliseconds(100)));
    Assert.Contains("timed out", ex.Message);
}

It should be updated to skip in CI, similar to the Node.js approach.

AI generated by SDK Consistency Review Agent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants