Fail CI if snapshots aren't present #304

friggeri · 2026-01-30T22:42:15Z

No description provided.

Copilot

Pull request overview

This PR tightens the snapshot-based replay harness so that CI runs fail when tests issue new or changed requests without corresponding snapshots, and it updates tests/snapshots to align with that stricter behavior. It also adds stubs for new endpoints and tweaks specific E2E tests to avoid relying on live LLM behavior in CI.

Changes:

Add or update session and permissions snapshot YAMLs (including a new sendandwait_throws_on_timeout snapshot and updated tool arguments/assistant messages).
Enhance ReplayingCapiProxy to (a) provide a default model for /models, (b) stub agent memory endpoints, and (c) treat missing cached responses as CI errors instead of warnings.
Update the Node.js session E2E tests to skip the timeout-specific sendAndWait test in CI, since it intentionally times out before a replayable response is available.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`test/snapshots/session/sendandwait_throws_on_timeout.yaml`	New snapshot capturing the model and initial system/user messages for the `sendAndWait`-timeout scenario.
`test/snapshots/session/send_returns_immediately_while_events_stream_in_background.yaml`	Adjusted tool-call arguments (intent string, shell description, `mode: "sync"`) and slightly rephrased final assistant text to match current behavior.
`test/snapshots/permissions/should_receive_toolcallid_in_permission_requests.yaml`	Minor assistant message wording change in the permissions snapshot to align with updated responses.
`test/harness/replayingCapiProxy.ts`	Introduced a default model for `/models`, added stub handlers for `/agents/.../memory/...` endpoints, and changed the CI fallback behavior so missing snapshots produce an error via `exitWithNoMatchingRequestError`.
`nodejs/test/e2e/session.test.ts`	Marked the `sendAndWait throws on timeout` test as `it.skipIf(process.env.CI === "true")` with clarifying comments, so CI doesn't depend on a replayable LLM response for this client-side timeout behavior.

Comments suppressed due to low confidence (1)

test/harness/replayingCapiProxy.ts:296

When process.env.CI === "true" and no cached response is found, exitWithNoMatchingRequestError calls options.onError, but the code still falls through to super.performRequest(options). This can cause the proxy to both send a 500 Proxy error response via onError and then attempt to proxy the request normally, leading to double-callbacks and potential write after end/header-sent errors. To avoid conflicting responses and ensure CI fails immediately when snapshots are missing, this branch should return early after exitWithNoMatchingRequestError instead of continuing to super.performRequest.

        // Fallback to normal proxying if no cached response found
        // This implicitly captures the new exchange too
        if (process.env.CI === "true") {
          await exitWithNoMatchingRequestError(
            options,
            state.testInfo,
            state.workDir,
            state.toolResultNormalizers,
          );
        }
        super.performRequest(options);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-30T22:44:50Z

test/harness/replayingCapiProxy.ts

+        // Handle memory endpoints - return stub responses in tests
+        // Matches: /agents/*/memory/*/enabled, /agents/*/memory/*/recent, etc.
+        if (options.requestOptions.path?.match(/\/agents\/.*\/memory\//)) {
+          let body: string;
+          if (options.requestOptions.path.includes("/enabled")) {
+            body = JSON.stringify({ enabled: false });
+          } else if (options.requestOptions.path.includes("/recent")) {
+            body = JSON.stringify({ memories: [] });
+          } else {
+            body = JSON.stringify({});
+          }
+          const headers = {
+            "content-type": "application/json",
+            ...commonResponseHeaders,
+          };
+          options.onResponseStart(200, headers);
+          options.onData(Buffer.from(body));
+          options.onResponseEnd();
+          return;
+        }


The new memory endpoint stubs (/agents/.../memory/...) and the CI-only error path for missing cached responses (exitWithNoMatchingRequestError) do not appear to have dedicated tests in replayingCapiProxy.test.ts, even though this file has comprehensive coverage for other behaviors. Given that these branches change how the proxy behaves in CI and for new endpoints, consider adding tests that (1) exercise the memory endpoint handlers and (2) verify that in CI, a request without a matching snapshot produces the expected ::error annotation and fails the request instead of silently proxying through.

github-actions · 2026-01-30T22:49:19Z

SDK Consistency Review: Test Infrastructure Changes

This PR modifies the shared test infrastructure to make CI builds fail when snapshots aren't present, and conditionally skips the Node.js sendAndWait throws on timeout test in CI environments.

✅ Good: Shared Test Harness Improvements

The changes to test/harness/replayingCapiProxy.ts improve the reliability of all SDK tests by:

Providing a default model (claude-sonnet-4.5) when no stored data is available
Adding stub responses for memory endpoints to support tests without requiring actual API calls
Converting warnings to errors when cached responses are missing in CI, ensuring test reliability

These changes benefit all four SDKs (Node.js, Python, Go, .NET) since they all use this shared test harness.

⚠️ Inconsistency: Timeout Test Coverage

However, there's a cross-SDK inconsistency with the timeout test:

Current State:

Node.js ✅ Has test, NOW skipped in CI (this PR)
.NET ✅ Has test (SendAndWait_Throws_On_Timeout), but NOT skipped in CI
Python ❌ Missing equivalent test
Go ❌ Missing equivalent test

Recommendation:

Since this PR is skipping the Node.js timeout test in CI (because it validates client-side behavior and has no LLM snapshot), the .NET test should be updated similarly:

Apply the same CI skip logic to .NET's SendAndWait_Throws_On_Timeout test to maintain consistency with Node.js
Consider adding equivalent tests to Python and Go for feature parity, also with CI skips

The .NET test currently looks like this:

[Fact]
public async Task SendAndWait_Throws_On_Timeout()
{
    var session = await Client.CreateSessionAsync();
    var ex = await Assert.ThrowsAsync&lt;TimeoutException&gt;(() =&gt;
        session.SendAndWaitAsync(new MessageOptions { Prompt = "Run 'sleep 2 &amp;&amp; echo done'" }, TimeSpan.FromMilliseconds(100)));
    Assert.Contains("timed out", ex.Message);
}

It should be updated to skip in CI, similar to the Node.js approach.

AI generated by SDK Consistency Review Agent

Fail CI if snapshots aren't present

7a9112c

friggeri requested a review from a team as a code owner January 30, 2026 22:42

Copilot AI review requested due to automatic review settings January 30, 2026 22:42

Copilot started reviewing on behalf of friggeri January 30, 2026 22:42 View session

Merge remote-tracking branch 'origin/main' into frg/e2e-only-snapshots

a0b15a0

Copilot AI reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail CI if snapshots aren't present #304

Fail CI if snapshots aren't present #304

friggeri commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fail CI if snapshots aren't present #304

Are you sure you want to change the base?

Fail CI if snapshots aren't present #304

Conversation

friggeri commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 30, 2026

SDK Consistency Review: Test Infrastructure Changes

✅ Good: Shared Test Harness Improvements

⚠️ Inconsistency: Timeout Test Coverage

Current State:

Recommendation:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants