-
-
Notifications
You must be signed in to change notification settings - Fork 727
v4: Improved run locking #2173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
v4: Improved run locking #2173
Conversation
|
Warning Rate limit exceeded@ericallam has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 24 minutes and 2 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (14)
WalkthroughThe changes introduce a manual lock acquisition and management system in the ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
internal-packages/run-engine/src/engine/tests/locking.test.ts (1)
11-17
: Consider centralising test-setup to reduce repetitionEach test re-creates a
Logger
+RunLocker
and manually callsrunLock.quit()
infinally
. Extracting this into abeforeEach
/afterEach
(or helper util) would DRY the suite and guarantee cleanup even if construction throws.Also applies to: 18-29
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts (1)
36-37
: Default to explicitfalse
forskipRunLock
& invert condition for readability
skipRunLock
is optional; when omitted it isundefined
, meaning!skipRunLock
evaluates totrue
(locks).
Making the default explicit:- skipRunLock, + skipRunLock = false,and using
skipRunLock ? routine() : lock(...)
would avoid the double-negation and make the intent clearer.Also applies to: 55-60
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
internal-packages/run-engine/src/engine/index.ts
(1 hunks)internal-packages/run-engine/src/engine/locking.ts
(4 hunks)internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
(2 hunks)internal-packages/run-engine/src/engine/tests/locking.test.ts
(12 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal-packages/run-engine/src/engine/locking.ts (3)
packages/core/src/logger.ts (2)
Logger
(19-130)error
(66-70)internal-packages/run-engine/src/run-queue/index.ts (2)
name
(196-198)T
(963-992)packages/redis-worker/src/worker.ts (1)
delay
(569-571)
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
internal-packages/run-engine/src/engine/index.ts (1)
489-534
: Re-evaluate concurrency guarantees after removing the run-lockPreviously the
trigger
flow ran inside a 5 srunLock
.
With the lock removed andskipRunLock: true
passed toenqueueRun
, concurrent invocations for the same run (or parent-blocking logic) now rely solely on DB constraints/transactions.If two workers could call
trigger
with identical idempotency keys very close together, you may re-introduce races around:
- creating the associated waitpoint,
- blocking the parent run,
- double-enqueueing.
Please confirm that upstream uniqueness constraints (e.g. on
taskRun.id
/idempotencyKey
) and transactional scopes are sufficient, or consider keeping a lightweight lock here.internal-packages/run-engine/src/engine/locking.ts (1)
343-348
: Missing abort-signal when lock skipped
lockIf(false, …)
invokesroutine()
with no arguments, but the callback type allows an optional signal.
Call-sites that rely onsignal!
will now receiveundefined
.
Either:routine(signal ?? (undefined as any))or document that the signal is only defined when locking occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
internal-packages/run-engine/src/engine/types.ts (1)
49-52
: Document numeric units & defaults for new props
duration
,automaticExtensionThreshold
, andretryConfig
lack JSDoc in this public type. Down-stream integrators will guess the unit (ms? seconds?). Please copy the inline docs fromLockRetryConfig
/RunLocker
so that IDE tooltips stay useful.internal-packages/run-engine/src/engine/locking.ts (1)
421-433
: Potential race: release issued while extension still in-flight
#cleanupExtension
clears the timeout but does not await thecontext.extension
promise.
If an extension RPC is still pending, callinglock.release()
immediately afterwards can result in:
release()
succeeds- extension completes ⇒ tries to mutate an already-released lock, throwing
ExecutionError
Safer pattern:
this.#cleanupExtension(manualContext); - const [releaseError] = await tryCatch(lock!.release()); + if (manualContext.extension) { + await manualContext.extension.catch(() => {}); // ignore failures + } + const [releaseError] = await tryCatch(lock!.release());Pre-empts spurious errors in noisy logs.
internal-packages/run-engine/src/engine/tests/locking.test.ts (1)
901-914
: Tests rely on real timers → high flake riskSeveral cases (
extension-test
,timing-test
, etc.) assert on wall-clockDate.now()
differences with 100–200 ms tolerances; CI jitter regularly exceeds that.Use Vitest fake timers (
vi.useFakeTimers() / vi.advanceTimersByTime()
) or widen the expectations:- expect(elapsed).toBeGreaterThan(190); - expect(elapsed).toBeLessThan(1000); + expect(elapsed).toBeGreaterThan(150); + expect(elapsed).toBeLessThan(1500);to avoid intermittent failures.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
apps/webapp/app/env.server.ts
(1 hunks)apps/webapp/app/v3/runEngine.server.ts
(1 hunks)internal-packages/run-engine/src/engine/index.ts
(2 hunks)internal-packages/run-engine/src/engine/locking.ts
(5 hunks)internal-packages/run-engine/src/engine/tests/locking.test.ts
(13 hunks)internal-packages/run-engine/src/engine/types.ts
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- apps/webapp/app/env.server.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- internal-packages/run-engine/src/engine/index.ts
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal-packages/run-engine/src/engine/types.ts (1)
internal-packages/run-engine/src/engine/locking.ts (1)
LockRetryConfig
(54-67)
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: typecheck / typecheck
🔇 Additional comments (2)
internal-packages/run-engine/src/engine/types.ts (1)
16-17
: Import path may break under classic TS module resolution
import { LockRetryConfig } from "./locking.js";
assumes ESM resolution with.js
-suffixed imports.
If the compiler is invoked with the defaultmoduleResolution: "node"
(classic CJS output) the file emitted on disk will belocking.js
, but TS will look forlocking.js.ts
at build time and fail to locate it.Two safer options:
-import { LockRetryConfig } from "./locking.js"; +import type { LockRetryConfig } from "./locking.js"; // keeps ESM style but types-onlyor (works for both
commonjs
&esnext
targets):-import { LockRetryConfig } from "./locking.js"; +import type { LockRetryConfig } from "./locking";internal-packages/run-engine/src/engine/locking.ts (1)
442-453
:automaticExtensionThreshold
>duration
silently disables extensionsEarly-exit guard checks
this.automaticExtensionThreshold > duration - 100
, but nothing prevents callers from passing an absurd threshold (e.g. 10 000 ms on a 5 000 ms lock).
Recommend clamping threshold to a sensible fraction ofduration
(e.g.Math.min(threshold, duration / 2)
) and logging a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/locking.ts (1)
69-76
: UnusedLockOptions
interface – dead code
LockOptions
is declared but never referenced. Remove it (or wire the constructor
to accept a singleoptions: LockOptions & { redis: Redis; logger: … }
object)
to avoid confusion and stale documentation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal-packages/run-engine/src/engine/locking.ts
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (1)
84-90
: 🛠️ Refactor suggestionMultiple run-attempt code paths lack dedicated handling for
LockAcquisitionTimeoutError
startRunAttempt
,attemptSucceeded
,attemptFailed
,tryNackAndRequeue
, andcancelRun
all switched to the 3-arglock()
but still assume the lock will always be acquired.
When contention is high the new locker can raise the timeout error and these methods will bubble an unhandled exception up the RPC boundary.Minimal fix per method:
try { return await this.$.runLock.lock("startRunAttempt", [runId], async () => { … }); } catch (e) { if (e instanceof LockAcquisitionTimeoutError) { // Visibility for ops + retry behaviour this.$.logger.debug("Could not acquire run lock, will retry", { runId, op: "startRunAttempt" }); throw new ServiceValidationError("Run is locked, please retry", 423); } throw e; }(Adjust status code/message as appropriate.)
Adds resilience without changing core logic.Also applies to: 440-446, 596-602, 906-912, 983-989
♻️ Duplicate comments (5)
internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts (1)
38-42
: Same lock-timeout concern as in ttlSystem – see previous comment for rationale and suggested pattern.internal-packages/run-engine/src/engine/locking.ts (4)
481-492
:lockIf(false, …)
leaks theAbortSignal
contractWhen the condition is
false
, the routine is invoked without a signal, diverging from code paths that expect one.
Either always supply a dummyAbortSignal
or tighten the callback type for the unlocked case.
25-40
:⚠️ Potential issuePreserve
Error
prototype & stack
LockAcquisitionTimeoutError
still omits the customaryObject.setPrototypeOf
(and optionalcaptureStackTrace
). Without this,instanceof
checks can fail once transpiled.super( /* … */ ); this.name = "LockAcquisitionTimeoutError"; + + // Fix prototype chain for ES5 targets + Object.setPrototypeOf(this, new.target.prototype); + /* c8 ignore next 3 */ + if (Error.captureStackTrace) { + Error.captureStackTrace(this, this.constructor); + }
95-101
: 🛠️ Refactor suggestionRedundant auto-extension: disable Redlock’s built-in timer
The constructor forwards
automaticExtensionThreshold
to Redlock while also running a custom extension loop.
This doubles extension traffic and introduces race conditions when either timer fails.- automaticExtensionThreshold: this.automaticExtensionThreshold, + // Disable Redlock's internal extender – handled manually + automaticExtensionThreshold: 0,
431-441
: 🛠️ Refactor suggestionTimer-cleanup race still present
#extendLock
setscontext.timeout = undefined
before checking the sentinel (!== null
).
Ifquit()
has already set the sentinel tonull
, the check passes and a new timer is scheduled, leaking after cleanup.Move the null‐check before clobbering the value, or guard with a local copy.
🧹 Nitpick comments (4)
internal-packages/run-engine/src/engine/systems/ttlSystem.ts (1)
24-27
: Lock now waits for the default timeout – verify that’s acceptable & add graceful fallbackDropping the explicit
5_000
ms makes the call rely entirely on the newRunLocker
defaults.
IfexpireRun
occasionally runs while another long-running operation still owns the lock, it may sit in the retry loop far longer than the original 5 s, or eventually throwLockAcquisitionTimeoutError
.Consider guarding the call so the job simply bails out (and logs) when the lock can’t be obtained, rather than bubbling the error up and turning a routine TTL sweep into an exception:
@@ - await this.$.runLock.lock("expireRun", [runId], async () => { + try { + await this.$.runLock.lock("expireRun", [runId], async () => { … - }); + }); + } catch (err) { + if (err instanceof LockAcquisitionTimeoutError) { + this.$.logger.debug("Could not acquire run lock to expire run, skipping", { runId }); + return; + } + throw err; + }(import
LockAcquisitionTimeoutError
from the locking module).
This mirrors the behaviour you already added indequeueSystem
and keeps the TTL worker robust.internal-packages/run-engine/src/engine/systems/checkpointSystem.ts (1)
54-58
: Lock acquisition error handling missing for checkpoint paths
createCheckpoint
andcontinueRunExecution
now rely on default retry/timeout but don’t catchLockAcquisitionTimeoutError
.
If a worker is holding the run lock for an extended period these public API calls will throw 500s instead of returning a clean “try again later”.Apply the graceful-bail pattern (see ttlSystem comment) or propagate a domain-specific 409/423 response so callers can safely retry.
Also applies to: 268-272
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts (1)
548-552
: Pending-version helper mirrors previous pattern – same optional timeout-handling adviceThe helper currently throws on lock-timeout; if that’s acceptable, nothing to do.
Otherwise consider the graceful fallback pattern discussed earlier.internal-packages/run-engine/src/engine/tests/locking.test.ts (1)
8-18
: Close Redis clients in testsEach test creates its own Redis connection via
createRedisClient
but never callsredis.quit()
/redis.disconnect()
.
Over dozens of cases this can leak sockets and slow CI. Add afinally
block (orafterEach
) to close the client alongsiderunLock.quit()
.
Example:- const redis = createRedisClient(redisOptions); + const redis = createRedisClient(redisOptions); const runLock = new RunLocker({ /* … */ }); try { … } finally { await runLock.quit(); + await redis.quit(); }Also applies to: 32-36
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
internal-packages/run-engine/src/engine/index.ts
(3 hunks)internal-packages/run-engine/src/engine/locking.ts
(6 hunks)internal-packages/run-engine/src/engine/systems/checkpointSystem.ts
(2 hunks)internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts
(1 hunks)internal-packages/run-engine/src/engine/systems/dequeueSystem.ts
(2 hunks)internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
(2 hunks)internal-packages/run-engine/src/engine/systems/releaseConcurrencySystem.ts
(0 hunks)internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
(5 hunks)internal-packages/run-engine/src/engine/systems/ttlSystem.ts
(1 hunks)internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
(2 hunks)internal-packages/run-engine/src/engine/tests/locking.test.ts
(10 hunks)
💤 Files with no reviewable changes (1)
- internal-packages/run-engine/src/engine/systems/releaseConcurrencySystem.ts
✅ Files skipped from review due to trivial changes (1)
- internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
- internal-packages/run-engine/src/engine/index.ts
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal-packages/run-engine/src/engine/tests/locking.test.ts (2)
internal-packages/testcontainers/src/index.ts (1)
redisTest
(167-167)internal-packages/run-engine/src/engine/locking.ts (2)
RunLocker
(69-535)LockAcquisitionTimeoutError
(25-40)
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts (1)
78-84
: 👍 Wrapped in try/catch already – looks good
dequeueFromWorkerQueue
is already inside atry
that logs and requeues on failure, so removing the explicit timeout is safe here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
internal-packages/run-engine/src/engine/locking.ts (2)
25-40
:⚠️ Potential issuePreserve the
Error
prototype chain & stack
LockAcquisitionTimeoutError
still missesObject.setPrototypeOf
(and optionalcaptureStackTrace
).instanceof LockAcquisitionTimeoutError
can fail in transpiled code, and stacks can be misleading.this.name = "LockAcquisitionTimeoutError"; + Object.setPrototypeOf(this, new.target.prototype); + /* c8 ignore next 3 */ + if (Error.captureStackTrace) { + Error.captureStackTrace(this, this.constructor); + }
90-98
:⚠️ Potential issueBuilt-in auto-extension still enabled – double extending the lock
automaticExtensionThreshold
is omitted from the Redlock options, so the library applies its default (~500 ms). Coupled with your custom#setupAutoExtension
, this re-introduces duplicate extensions and the race condition previously flagged.
Explicitly disable it:this.redlock = new Redlock([options.redis], { retryCount: 0, + automaticExtensionThreshold: 0, // disable Redlock’s internal extender });
🧹 Nitpick comments (2)
internal-packages/run-engine/src/engine/locking.ts (2)
445-457
: Constant 100 ms retry on extension can thrash RedisWhen an extension fails but the lock hasn’t expired, the code retries every 100 ms indefinitely. Under a Redis outage this results in a hot loop.
Consider re-using the exponential back-off logic (or at least increasing the interval progressively) and enforcing a cap on total retry time to avoid unnecessary load.
503-513
: Expose config immutably
getRetryConfig
,getDuration
, andgetAutomaticExtensionThreshold
return internal objects/primitives directly. While primitives are safe, the config object can be mutated by callers. Return a deep-frozen copy to avoid accidental mutation.- return { ...this.retryConfig }; + return Object.freeze({ ...this.retryConfig });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal-packages/run-engine/src/engine/locking.ts
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: Analyze (javascript-typescript)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
internal-packages/run-engine/src/engine/locking.ts (1)
1-5
: 🛠️ Refactor suggestionAvoid mixed CJS/ESM import styles for the same module
Redlock
is imported twice – once viarequire()
and once viaimport * as
.
Besides being redundant, bundlers/TS emit can end up with two distinct instances of the class, breakinginstanceof
checks and typings.-// import { default: Redlock } from "redlock"; -const { default: Redlock } = require("redlock"); -import * as redlock from "redlock"; +import Redlock, * as redlock from "redlock";Keep a single ESM import and (optionally) add a separate
import type { Lock } from "redlock";
for types only.
This also lets the build tool tree-shake unused CJS helpers.
♻️ Duplicate comments (1)
internal-packages/run-engine/src/engine/locking.ts (1)
25-40
:LockAcquisitionTimeoutError
still misses prototype fix & stack capture
The earlier review pointed this out and it remains unresolved – the subclass will failinstanceof
after transpilation to ES5.this.name = "LockAcquisitionTimeoutError"; + Object.setPrototypeOf(this, new.target.prototype); + /* c8 ignore next 3 */ + if (Error.captureStackTrace) { + Error.captureStackTrace(this, this.constructor); + } }
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/locking.ts (1)
219-233
: Minor logging inconsistency – use the sorted resource listLogs inside retry acquisition mix
resources
(caller order) andsortedResources
(lock order).
This hampers grep-ability when diagnosing contention.- this.logger.debug("[RunLocker] Lock acquired after retries", { - name, - resources: sortedResources, + this.logger.debug("[RunLocker] Lock acquired after retries", { + name, + resources: sortedResources, // keep consistently sorted everywhereApply the same change to other log blocks in this method.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal-packages/run-engine/src/engine/locking.ts
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
internal-packages/run-engine/src/engine/tests/locking.test.ts (1)
110-148
: 🛠️ Refactor suggestionLong
setTimeout
calls will slow the suite and flake on CISeveral tests block the event-loop for 10 s–15 s (
setTimeout(10000)
,15000
, etc.). This pushes total test time into minutes and is prone to flakiness on loaded CI workers.Recommendation: use fake timers (e.g.
vi.useFakeTimers()
/advanceTimersByTime
) or stub the Redlock clock so you can assert behaviour without real delays.
♻️ Duplicate comments (1)
internal-packages/run-engine/src/engine/locking.ts (1)
25-40
:⚠️ Potential issueError subclass still lacks prototype‐fix –
instanceof
may failThe constructor sets
this.name
, but it never re-establishes the prototype chain, soerror instanceof LockAcquisitionTimeoutError
(orError.captureStackTrace
) can be broken once the code is transpiled to ES5/ES2017.this.name = "LockAcquisitionTimeoutError"; + // Maintain prototype chain for down-level transpilation + /* c8 ignore next 4 */ + if (Object.setPrototypeOf) { + Object.setPrototypeOf(this, new.target.prototype); + } + /* c8 ignore next 3 */ + if (Error.captureStackTrace) { + Error.captureStackTrace(this, this.constructor); + } }
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/locking.ts (1)
279-304
: Shadowederror
variable clouds logsInside the retry loop the name
error
is reused for both the result oftryCatch
and the outerlastError
. When you later logerror.message
, it may reference an out-of-scope value (orundefined
) depending on control-flow, confusing diagnostics.Consider renaming the inner destructured variable:
-const [error, acquiredLock] = await tryCatch(this.redlock.acquire(sortedResources, duration)); +const [acquireErr, acquiredLock] = + await tryCatch(this.redlock.acquire(sortedResources, duration));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/webapp/app/v3/runEngine.server.ts
(1 hunks)internal-packages/run-engine/src/engine/index.ts
(3 hunks)internal-packages/run-engine/src/engine/locking.ts
(6 hunks)internal-packages/run-engine/src/engine/tests/locking.test.ts
(10 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- apps/webapp/app/v3/runEngine.server.ts
- internal-packages/run-engine/src/engine/index.ts
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
internal-packages/run-engine/src/engine/locking.ts (1)
164-166
: 🛠️ Refactor suggestion
lockId
based solely onDate.now()
risks collision under high throughputUsing
Date.now()
(millisecond resolution) can generate identicallockId
s when many locks are taken in the same ms, causing map key clashes and metric mis-aggregation. Replace with a monotonic counter orcrypto.randomUUID()
.- const lockId = `${name}:${joinedResources}:${Date.now()}`; + const lockId = `${name}:${joinedResources}:${crypto.randomUUID()}`;
♻️ Duplicate comments (1)
internal-packages/run-engine/src/engine/locking.ts (1)
25-40
:LockAcquisitionTimeoutError
still misses prototype-fix –instanceof
will breakEarlier feedback highlighted the need to reset the prototype (and optionally capture the stack) when extending
Error
. The current implementation still omits it, soerr instanceof LockAcquisitionTimeoutError
can fail when transpiled to ES5.
Please addObject.setPrototypeOf(this, new.target.prototype)
(and optionallyError.captureStackTrace
).
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/locking.ts (1)
411-433
: Race-condition window still exists beforesetTimeout
context.timeout !== null
is checked outside the callback passed tosetTimeout
. If#cleanupExtension
runs after the check but before the timer schedules, a new timer can still be created.
Re-check the sentinel immediately before scheduling:- if (context.timeout !== null) { - context.timeout = setTimeout(() => { + if (context.timeout !== null) { + const t = setTimeout(() => { … - }, timeUntilExtension); + }, timeUntilExtension); + // verify cleanup wasn’t requested in the meantime + if (context.timeout !== null) context.timeout = t; + else clearTimeout(t); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal-packages/run-engine/src/engine/locking.ts
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (25)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
… RunLock instance)
No description provided.