Skip to content

Ability to perform bulk run backfills #2304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 23, 2025
Merged

Conversation

ericallam
Copy link
Member

No description provided.

Copy link

changeset-bot bot commented Jul 23, 2025

⚠️ No Changeset found

Latest commit: b6e4ba8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Jul 23, 2025

Walkthrough

This change introduces a new "admin worker" subsystem for batch operations such as backfilling task runs to a replication database. It adds environment variable schema entries to configure the admin worker’s concurrency, polling intervals, shutdown timeout, logging, and Redis connection settings. New authenticated API routes allow admin users to start and cancel backfill jobs by batch ID. The RunsBackfillerService class implements the backfilling logic with tracing and logging, supported by integration tests that verify replication correctness. The Redis-backed worker queue is extended with a cancellation mechanism using cancellation keys, including updates to queue and worker classes and new tests validating job cancellation behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45-60 minutes


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31c8562 and b6e4ba8.

📒 Files selected for processing (2)
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.backfill.ts (1 hunks)
  • apps/webapp/app/v3/services/adminWorker.server.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.backfill.ts
  • apps/webapp/app/v3/services/adminWorker.server.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
apps/webapp/test/runsBackfiller.test.ts (2)

8-8: Consider making the timeout configurable or documenting why 60 seconds is needed.

The 60-second timeout is quite long for a unit test. While integration tests with containers can take time, consider if this could be optimized or if the timeout should be environment-dependent.


68-86: Consider extracting the test data creation into a helper function.

The loop creating 11 task runs could be extracted into a helper function to improve readability and reusability. This would also make it easier to modify the test data structure if needed.

+const createTaskRun = async (prisma: any, index: number, runtimeEnvironment: any, project: any, organization: any) => {
+  return await prisma.taskRun.create({
+    data: {
+      friendlyId: `run_1234_${index}`,
+      taskIdentifier: "my-task",
+      payload: JSON.stringify({ foo: "bar" }),
+      traceId: "1234",
+      spanId: "1234",
+      queue: "test",
+      runtimeEnvironmentId: runtimeEnvironment.id,
+      projectId: project.id,
+      organizationId: organization.id,
+      environmentType: "DEVELOPMENT",
+      engine: "V2",
+      status: "COMPLETED_SUCCESSFULLY",
+    },
+  });
+};

-      // Insert 11 completed runs into the database
-      for (let i = 0; i < 11; i++) {
-        await prisma.taskRun.create({
-          data: {
-            friendlyId: `run_1234_${i}`,
-            taskIdentifier: "my-task",
-            payload: JSON.stringify({ foo: "bar" }),
-            traceId: "1234",
-            spanId: "1234",
-            queue: "test",
-            runtimeEnvironmentId: runtimeEnvironment.id,
-            projectId: project.id,
-            organizationId: organization.id,
-            environmentType: "DEVELOPMENT",
-            engine: "V2",
-            status: "COMPLETED_SUCCESSFULLY",
-          },
-        });
-      }
+      // Insert 11 completed runs into the database
+      for (let i = 0; i < 11; i++) {
+        await createTaskRun(prisma, i, runtimeEnvironment, project, organization);
+      }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b447a80 and ffb45a4.

📒 Files selected for processing (10)
  • apps/webapp/app/env.server.ts (1 hunks)
  • apps/webapp/app/routes/admin.api.v1.feature-flags.ts (1 hunks)
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.backfill.ts (1 hunks)
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.cancel.ts (1 hunks)
  • apps/webapp/app/services/runsBackfiller.server.ts (1 hunks)
  • apps/webapp/app/v3/services/adminWorker.server.ts (1 hunks)
  • apps/webapp/test/runsBackfiller.test.ts (1 hunks)
  • packages/redis-worker/src/queue.ts (4 hunks)
  • packages/redis-worker/src/worker.test.ts (1 hunks)
  • packages/redis-worker/src/worker.ts (3 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

  • apps/webapp/app/routes/admin.api.v1.feature-flags.ts
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.cancel.ts
  • packages/redis-worker/src/worker.test.ts
  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/test/runsBackfiller.test.ts
  • packages/redis-worker/src/worker.ts
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.backfill.ts
  • apps/webapp/app/services/runsBackfiller.server.ts
  • packages/redis-worker/src/queue.ts
  • apps/webapp/app/env.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

  • apps/webapp/app/routes/admin.api.v1.feature-flags.ts
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.cancel.ts
  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/test/runsBackfiller.test.ts
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.backfill.ts
  • apps/webapp/app/services/runsBackfiller.server.ts
  • apps/webapp/app/env.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: In the webapp, all environment variables must be accessed through the env export of env.server.ts, instead of directly accessing process.env.
When importing from @trigger.dev/core in the webapp, never import from the root @trigger.dev/core path; always use one of the subpath exports as defined in the package's package.json.

Files:

  • apps/webapp/app/routes/admin.api.v1.feature-flags.ts
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.cancel.ts
  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/test/runsBackfiller.test.ts
  • apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.backfill.ts
  • apps/webapp/app/services/runsBackfiller.server.ts
  • apps/webapp/app/env.server.ts
**/*.test.{ts,tsx}

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

Our tests are all vitest

Files:

  • packages/redis-worker/src/worker.test.ts
  • apps/webapp/test/runsBackfiller.test.ts
apps/webapp/app/services/**/*.server.ts

📄 CodeRabbit Inference Engine (.cursor/rules/webapp.mdc)

For testable services, separate service logic and configuration, as exemplified by realtimeClient.server.ts (service) and realtimeClientGlobal.server.ts (configuration).

Files:

  • apps/webapp/app/services/runsBackfiller.server.ts
🧠 Learnings (7)
apps/webapp/app/routes/admin.api.v1.feature-flags.ts (6)

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-18T17:49:24.458Z
Learning: Applies to {packages/core,apps/webapp}/**/*.{ts,tsx} : We use zod a lot in packages/core and in the webapp

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-07-18T17:49:47.168Z
Learning: Applies to apps/webapp/app/**/*.test.{ts,tsx} : Tests in the webapp should only import classes and functions from files matching app/**/*.ts, and those files should not use environment variables directly; all configuration should be passed as options.

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-07-18T17:49:47.168Z
Learning: Applies to apps/webapp/**/*.{ts,tsx} : When importing from @trigger.dev/core in the webapp, never import from the root @trigger.dev/core path; always use one of the subpath exports as defined in the package's package.json.

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-18T17:49:24.458Z
Learning: Applies to **/*.{ts,tsx} : No default exports, use function declarations

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-07-18T17:49:47.168Z
Learning: Applies to apps/webapp/app/**/*.test.{ts,tsx} : Test files in the webapp should not import env.server.ts, either directly or indirectly. Tests should only import classes and functions from files matching app/**/*.ts of the webapp, and those files should not use environment variables directly; everything should be passed through as options instead.

Learnt from: matt-aitken
PR: #2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

packages/redis-worker/src/worker.test.ts (1)

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-07-18T17:50:24.984Z
Learning: Applies to /trigger//*.{ts,tsx,js,jsx} : When using retry, queue, machine, or maxDuration options, configure them as shown in the examples for Trigger.dev tasks.

apps/webapp/app/v3/services/adminWorker.server.ts (2)

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-07-18T17:49:47.168Z
Learning: Applies to apps/webapp/app/services/**/*.server.ts : For testable services, separate service logic and configuration, as exemplified by realtimeClient.server.ts (service) and realtimeClientGlobal.server.ts (configuration).

Learnt from: matt-aitken
PR: #2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

apps/webapp/test/runsBackfiller.test.ts (5)

Learnt from: matt-aitken
PR: #2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-18T17:49:24.458Z
Learning: Applies to **/*.test.{ts,tsx} : Our tests are all vitest

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-07-18T17:49:47.168Z
Learning: Applies to apps/webapp/app/**/*.test.{ts,tsx} : Tests in the webapp should only import classes and functions from files matching app/**/*.ts, and those files should not use environment variables directly; all configuration should be passed as options.

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-07-18T17:49:47.168Z
Learning: Applies to apps/webapp/app/**/*.test.{ts,tsx} : Test files in the webapp should not import env.server.ts, either directly or indirectly. Tests should only import classes and functions from files matching app/**/*.ts of the webapp, and those files should not use environment variables directly; everything should be passed through as options instead.

Learnt from: ericallam
PR: #2175
File: apps/webapp/app/services/environmentMetricsRepository.server.ts:202-207
Timestamp: 2025-06-14T08:07:46.625Z
Learning: In apps/webapp/app/services/environmentMetricsRepository.server.ts, the ClickHouse methods (getTaskActivity, getCurrentRunningStats, getAverageDurations) intentionally do not filter by the tasks parameter at the ClickHouse level, even though the tasks parameter is accepted by the public methods. This is done on purpose as there is not much benefit from adding that filtering at the ClickHouse layer.

apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.backfill.ts (1)

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-07-18T17:50:24.984Z
Learning: Applies to /trigger//*.{ts,tsx,js,jsx} : When triggering a task from backend code, use tasks.trigger, tasks.batchTrigger, or tasks.triggerAndPoll as shown in the examples.

apps/webapp/app/services/runsBackfiller.server.ts (3)

Learnt from: matt-aitken
PR: #2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-18T17:49:24.458Z
Learning: Applies to internal-packages/database/**/*.{ts,tsx} : We use prisma in internal-packages/database for our database interactions using PostgreSQL

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-07-18T17:50:24.984Z
Learning: Applies to /trigger//*.{ts,tsx,js,jsx} : When triggering a task from backend code, use tasks.trigger, tasks.batchTrigger, or tasks.triggerAndPoll as shown in the examples.

apps/webapp/app/env.server.ts (4)

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-07-18T17:49:47.168Z
Learning: Applies to apps/webapp/**/*.{ts,tsx} : In the webapp, all environment variables must be accessed through the env export of env.server.ts, instead of directly accessing process.env.

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-07-18T17:49:47.168Z
Learning: Applies to apps/webapp/app/**/*.test.{ts,tsx} : Test files in the webapp should not import env.server.ts, either directly or indirectly. Tests should only import classes and functions from files matching app/**/*.ts of the webapp, and those files should not use environment variables directly; everything should be passed through as options instead.

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-07-18T17:50:24.984Z
Learning: Applies to trigger.config.ts : Global lifecycle hooks, telemetry, runtime, machine settings, log level, max duration, and build configuration must be set in trigger.config.ts as shown.

Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-07-18T17:50:24.984Z
Learning: Applies to trigger.config.ts : Build extensions such as additionalFiles, additionalPackages, aptGet, emitDecoratorMetadata, prismaExtension, syncEnvVars, puppeteer, ffmpeg, and esbuildPlugin must be configured in trigger.config.ts as shown.

🧬 Code Graph Analysis (2)
apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.cancel.ts (3)
apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.backfill.ts (1)
  • action (19-65)
apps/webapp/app/services/personalAccessToken.server.ts (1)
  • authenticateApiRequestWithPersonalAccessToken (105-114)
apps/webapp/app/v3/services/adminWorker.server.ts (1)
  • adminWorker (100-100)
packages/redis-worker/src/worker.ts (2)
internal-packages/zod-worker/src/index.ts (1)
  • K (581-660)
internal-packages/tracing/src/index.ts (1)
  • startSpan (49-79)
🪛 Gitleaks (8.27.2)
packages/redis-worker/src/worker.test.ts

618-618: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (33)
apps/webapp/app/routes/admin.api.v1.feature-flags.ts (1)

4-4: LGTM! Clean import simplification.

The unused feature flag utilities related to runs replication have been appropriately removed, as this functionality is now handled by dedicated services and API routes. The remaining imports are still actively used in the action function.

packages/redis-worker/src/worker.ts (2)

261-312: LGTM! Well-implemented cancellation key support.

The optional cancellationKey parameter is cleanly integrated into the enqueue method signature and properly passed through to the underlying queue. The implementation maintains backward compatibility and follows existing patterns.


397-406: LGTM! Proper cancellation method implementation.

The new cancel method is well-implemented with:

  • Clear documentation explaining its purpose and behavior
  • Proper tracing integration using startSpan
  • Appropriate delegation to the queue layer
  • Clean separation of concerns
apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.cancel.ts (1)

1-45: LGTM! Well-structured admin API endpoint.

The cancellation endpoint is properly implemented with:

  • Correct authentication using personal access token
  • Admin privilege verification with appropriate 403 response
  • Input validation using Zod for the batchId parameter
  • Proper error handling with appropriate HTTP status codes
  • Clean integration with the admin worker service
  • Consistent patterns with other admin API routes
packages/redis-worker/src/worker.test.ts (1)

552-629: LGTM! Comprehensive test for cancellation functionality.

The test properly verifies both aspects of the cancellation feature:

  • Jobs with cancellation keys can be enqueued and processed normally
  • Jobs with cancelled keys are prevented from being enqueued

The test structure follows established patterns and includes appropriate assertions to verify the behavior.

Note: The static analysis hint about a "Generic API Key" on line 618 is a false positive - this is clearly a test cancellation key string, not a real API key.

apps/webapp/app/routes/admin.api.v1.runs-replication.$batchId.backfill.ts (1)

1-65: LGTM! Well-implemented backfill API endpoint.

The backfill endpoint is properly structured with:

  • Consistent authentication and admin privilege verification
  • Comprehensive input validation using Zod for both route parameters and request body
  • Appropriate use of z.coerce.date() for date parsing
  • Reasonable default batch size of 500
  • Proper job enqueueing with structured payload
  • Consistent error handling patterns with other admin routes

The implementation follows established patterns and integrates cleanly with the admin worker infrastructure.

apps/webapp/test/runsBackfiller.test.ts (4)

1-9: LGTM! Imports and setup are well-structured.

The imports follow the coding guidelines, using proper relative imports and organizing dependencies logically. The test setup is clean and follows vitest best practices.


14-38: Well-configured service dependencies.

The ClickHouse and RunsReplicationService configurations are properly set up with appropriate parameters for testing. The use of containerized services provides good isolation.


131-138: Verify the time window calculation logic.

The test creates a time window with from = Date.now() - 10000 (10 seconds ago) and to = Date.now() + 1000 (1 second in the future). However, the task runs are created with default timestamps (now), and one run is explicitly created 60 seconds ago. Ensure this timing logic is robust and won't cause flaky tests due to timing issues.

The timing logic could be made more explicit by using fixed dates rather than relative timestamps to avoid potential race conditions in the test.


162-173: Good verification of backfilled data.

The test properly verifies that exactly 11 completed runs within the time window were replicated to ClickHouse, which matches the expected behavior based on the test setup.

apps/webapp/app/services/runsBackfiller.server.ts (7)

1-7: LGTM! Clean imports following coding guidelines.

The imports are well-organized and follow the project's coding guidelines, using proper relative imports and organizing dependencies logically.


14-24: Well-designed constructor with proper dependency injection.

The constructor properly accepts all required dependencies and sets up logging with a configurable log level. The default "debug" level is appropriate for a service that might need detailed logging during backfill operations.


36-42: Good tracing implementation with comprehensive attributes.

The OpenTelemetry span setup captures all relevant parameters for observability. The attributes will be helpful for monitoring and debugging backfill operations.


43-58: Efficient database query with proper filtering and pagination.

The Prisma query is well-structured with:

  • Proper date range filtering
  • Status filtering using FINAL_RUN_STATUSES
  • Cursor-based pagination
  • Ascending order for consistent pagination
  • Configurable batch size

The query should perform well with proper database indexing.


60-64: Good early return for empty results.

The early return when no runs are found is efficient and includes proper logging with relevant context.


66-74: Comprehensive logging with valuable context.

The logging before backfilling includes all relevant metadata that would be useful for monitoring and debugging, including first/last timestamps and run counts.


76-90: Proper delegation and cursor management.

The service correctly delegates the actual backfilling to the replication service and returns the last run ID as a cursor for pagination. The final logging provides good visibility into the completion of the operation.

packages/redis-worker/src/queue.ts (5)

86-88: LGTM! Simple and effective cancellation key implementation.

The cancel method sets a Redis key with a 24-hour expiration, which is a reasonable default for job cancellation. The key naming with prefix is good for avoiding collisions.


97-105: Good API design for optional cancellation support.

The cancellationKey parameter is properly optional, maintaining backward compatibility while enabling the new cancellation feature.


118-127: Correct conditional logic for cancellation-aware enqueuing.

The logic properly chooses between the new cancellation-aware command and the existing command based on whether a cancellation key is provided. The parameter passing is correct.


421-442: Well-implemented Lua script for atomic cancellation check.

The Lua script correctly:

  • Takes 3 keys (queue, items, cancellationKey)
  • Checks for cancellation key existence before enqueuing
  • Returns early if cancelled
  • Performs the same atomic operations as the original command

The script maintains atomicity and prevents race conditions.


634-644: Proper TypeScript interface extension.

The new Redis command interface correctly mirrors the Lua script signature with proper typing for all parameters and return type.

apps/webapp/app/v3/services/adminWorker.server.ts (9)

1-11: LGTM! Clean imports following coding guidelines.

The imports are well-organized and follow the webapp coding guidelines, using proper relative imports and accessing environment variables through the env export.


13-21: Good Redis configuration with proper fallbacks.

The Redis configuration properly uses environment variables with appropriate fallbacks and handles TLS configuration conditionally. The key prefix helps avoid collisions with other workers.


28-40: Well-designed job schema and configuration.

The job schema is properly defined with:

  • Date coercion for from/to parameters
  • Optional cursor for pagination
  • Reasonable default batch size (500)
  • Appropriate visibility timeout (15 minutes)
  • Proper retry configuration (5 max attempts)

42-50: Good worker configuration using environment variables.

The concurrency, polling, and logging configurations are properly driven by environment variables, providing good operational flexibility.


53-56: Proper error handling for missing dependencies.

The early check for runsReplicationInstance with error logging is good defensive programming. This prevents silent failures.


58-69: Correct service instantiation and execution.

The RunsBackfillerService is properly instantiated with the replica database client, replication instance, and tracer. The service call passes through all the job payload parameters correctly.


71-84: Smart pagination logic with cancellation support.

The re-enqueuing logic is well-designed:

  • Only re-enqueues if there's more work (cursor returned)
  • Uses the same job ID for the cancellation key
  • Adds a 1-second delay to prevent busy polling
  • Maintains all original parameters while updating the cursor

This enables efficient pagination through large datasets.


89-98: Good conditional startup with comprehensive logging.

The worker only starts if enabled by environment configuration, and the startup logging includes all relevant configuration details for operational visibility.


100-100: Appropriate use of singleton pattern.

Using a singleton for the worker ensures a single shared instance across the application, which is correct for this infrastructure component.

apps/webapp/app/env.server.ts (2)

766-773: Well-designed admin worker configuration schema.

The admin worker environment variables follow the established patterns in the codebase with:

  • Proper fallback to WORKER_ENABLED for the enabled flag
  • Reasonable defaults for concurrency settings
  • Appropriate polling intervals
  • Sensible timeout values
  • Configurable log level with appropriate default

All defaults align well with other worker configurations in the file.


775-803: Consistent Redis configuration with proper fallbacks.

The Redis configuration for the admin worker properly:

  • Uses transform functions to fall back to general Redis environment variables
  • Follows the exact same pattern as other Redis configurations
  • Includes all necessary connection parameters
  • Has appropriate defaults for TLS and cluster mode

This ensures consistent Redis connectivity across all worker types.

@ericallam ericallam merged commit ef59a62 into main Jul 23, 2025
33 checks passed
@ericallam ericallam deleted the admin-worker-batch-runs-backfill branch July 23, 2025 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants