-
Notifications
You must be signed in to change notification settings - Fork 1
Refactor etcd, victorialogs, and victoriametrics to share base component #533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor etcd, victorialogs, and victoriametrics to share base component #533
Conversation
Extract common container lifecycle management into components/base package. This eliminates ~700 lines of duplicated code across the three components and adds auto-restart capability to victoriametrics (previously missing). The BaseComponent provides shared functionality for: - Container and task lifecycle (Start/Stop) - Exit monitoring with auto-restart and exponential backoff - Readiness checking via TCP connectivity - Graceful shutdown with SIGTERM/SIGKILL escalation Each component embeds BaseComponent and provides callbacks for component-specific behavior (task creation, ready port).
📝 WalkthroughWalkthroughAdds a new Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @components/base/base.go:
- Around line 243-249: After sending SIGKILL with task.Kill, the code calls
task.Wait(killCtx) but does not block on the returned channel, so the process
may be deleted before it fully exits; change the logic in the kill/cleanup block
(references: task.Kill, task.Wait, task.Delete, b.ComponentName) to capture the
wait channel/result from task.Wait(killCtx) and block on it (e.g. receive from
the channel or select) to obtain the wait result/error, log any waitErr, and
only proceed to task.Delete after the wait completes; ensure you still handle
and log errors from task.Kill and from the wait result.
🧹 Nitpick comments (3)
components/base/base.go (2)
128-134: Consider documenting the side effect of SetTask.
SetTaskimplicitly setsrunning = true. This coupling is reasonable semantically (a task means running), but callers may not expect it. Consider adding a brief comment noting this side effect.📝 Suggested documentation
// SetTask sets the task reference and marks the component as running. +// Note: This also sets running=true since having a task implies the component is running. func (b *BaseComponent) SetTask(task containerd.Task) {
283-286: Consider respecting context cancellation during retry sleep.The retry loop uses
time.Sleepwhich doesn't respect context cancellation. If the parent context is canceled, the retry will still wait the full delay.♻️ Proposed improvement
if attempt < maxRetries { - time.Sleep(retryDelay) + select { + case <-ctx.Done(): + b.Log.Warn(b.ComponentName+" container delete retry cancelled", "error", ctx.Err()) + return + case <-time.After(retryDelay): + } }components/etcd/etcd.go (1)
54-56: Prefertime.Secondfor clarity.Using
1e9for nanoseconds works but is less readable thantime.Second.♻️ Suggested improvement
- bc.ReadyConfig.DialTimeout = 1e9 // 1 second in nanoseconds - bc.ReadyConfig.Interval = 1e9 // 1 second in nanoseconds + bc.ReadyConfig.DialTimeout = time.Second + bc.ReadyConfig.Interval = time.Second
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
components/base/base.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.gocomponents/victoriametrics/victoriametrics.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.go: Always runmake lintbefore committing - runs golangci-lint on the entire codebase
Runmake lint-fixto automatically fix linting issues where possible
Only add comments when they provide valuable context or explain 'why' something is done - avoid redundant comments that restate what the code does
Function/method comments should explain the purpose and any important side effects, not just restate the name
Codebase follows standard Go formatting conventions
Files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.gocomponents/base/base.go
🧠 Learnings (6)
📓 Common learnings
Learnt from: evanphx
Repo: mirendev/runtime PR: 59
File: cli/commands/dev.go:80-117
Timestamp: 2025-05-22T21:54:17.159Z
Learning: In the context of etcd component integration, evanphx preferred to keep the original implementation approach for error handling in deferred cleanup functions and variable shadowing patterns.
Learnt from: phinze
Repo: mirendev/runtime PR: 173
File: cli/commands/commands_linux.go:29-31
Timestamp: 2025-08-12T20:24:39.993Z
Learning: In the MIR-309 CLI rename from runtime to miren, the containerd namespace should also be changed from "runtime" to "miren" to align with the overall rebranding effort, along with other component names like etcd and clickhouse containers.
📚 Learning: 2025-05-30T21:01:16.000Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/integration_test.go:13-16
Timestamp: 2025-05-30T21:01:16.000Z
Learning: The containerd v2 client should be imported using the path "github.com/containerd/containerd/v2/client", not from the root module path. The "/client" suffix is required for the v2 client import.
Applied to files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.go
📚 Learning: 2025-05-30T21:01:16.000Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/integration_test.go:13-16
Timestamp: 2025-05-30T21:01:16.000Z
Learning: The containerd v2 client import path is "github.com/containerd/containerd/v2/client" with the "/client" suffix required. This is the official import path for containerd 2.x client library, not the root module path.
Applied to files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.go
📚 Learning: 2026-01-07T23:46:12.875Z
Learnt from: CR
Repo: mirendev/runtime PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-07T23:46:12.875Z
Learning: Project uses iso for containerized development with all dependencies (Go 1.25, containerd, buildkit, etc.) - defined in `.iso/Dockerfile`
Applied to files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.go
📚 Learning: 2025-05-29T23:20:55.872Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/clickhouse.go:259-299
Timestamp: 2025-05-29T23:20:55.872Z
Learning: In containerd task management, if task.Kill() fails, it usually indicates fundamental issues (task already dead, permission problems, or containerd connection issues) where subsequent operations like task.Wait() or task.Delete() would likely also fail. Early return after Kill failure is often the appropriate behavior rather than attempting further cleanup operations.
Applied to files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.go
📚 Learning: 2025-05-22T21:54:17.159Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 59
File: cli/commands/dev.go:80-117
Timestamp: 2025-05-22T21:54:17.159Z
Learning: In the context of etcd component integration, evanphx preferred to keep the original implementation approach for error handling in deferred cleanup functions and variable shadowing patterns.
Applied to files:
components/etcd/etcd.go
🧬 Code graph analysis (3)
components/victoriametrics/victoriametrics.go (1)
components/base/base.go (2)
BaseComponent(62-89)NewBaseComponent(92-102)
components/etcd/etcd.go (2)
components/base/base.go (2)
BaseComponent(62-89)NewBaseComponent(92-102)pkg/slogout/slogout.go (3)
WithLogger(296-307)WithJSONParsing(45-49)WithMaxLevel(59-64)
components/victorialogs/victorialogs.go (3)
components/base/base.go (2)
BaseComponent(62-89)NewBaseComponent(92-102)pkg/serverconfig/config.gen.go (1)
VictoriaLogsConfig(422-427)discovery/network.go (1)
HTTPEndpoint(16-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-binaries (darwin, arm64, macos-latest)
- GitHub Check: build-binaries (darwin, amd64, macos-latest)
- GitHub Check: lint
- GitHub Check: test
- GitHub Check: test-e2e
🔇 Additional comments (28)
components/base/base.go (8)
1-18: LGTM - Package documentation and imports are well-structured.Clear package-level documentation explaining the purpose of the base component framework.
20-52: LGTM - Well-documented configuration structs with sensible defaults.The restart policy with exponential backoff and the readiness check configuration provide good flexibility for component-specific tuning.
54-102: LGTM - Clean separation of concerns with callback-based customization.The dual-mutex design appropriately separates operation serialization (
opMu) from state protection (stateMu).
157-220: LGTM - Careful lock management in stop sequence.The pattern of capturing state under lock, releasing, then operating is correct. Clearing monitor channels before releasing the lock prevents races with
StartExitMonitor.
301-313: LGTM - Clean container cleanup for restart scenarios.Properly stops any existing task before deleting the container.
315-337: LGTM - Standard TCP readiness check with proper context handling.The implementation correctly respects context cancellation and cleans up connections.
339-411: LGTM - Well-designed exit monitoring with restart loop.The monitor correctly captures state under lock before async operations, handles restart failures gracefully, and obtains a fresh exit channel after successful restarts.
413-496: LGTM - Robust restart handling with exponential backoff.The restart logic correctly:
- Resets the counter after the reset window
- Enforces max restart limits
- Implements capped exponential backoff
- Respects stop signals during backoff
- Cleans up the old task before creating a new one
components/etcd/etcd.go (6)
44-50: LGTM - Clean struct embedding of BaseComponent.The component correctly embeds the base and adds only component-specific fields.
69-76: LGTM - Appropriate logging configuration for etcd.Using JSON parsing and clamping to Info level for etcd's structured logs is appropriate.
78-175: LGTM - Clean integration with base component lifecycle.The Start method properly uses base component APIs for locking, state management, readiness checking, and exit monitoring.
177-189: LGTM - Endpoint methods correctly use base state checking.
191-259: LGTM - Comprehensive restart logic for existing containers.The method handles all cases: running task, stopped task, and failed task, with proper integration into the base component lifecycle.
261-314: LGTM - Container creation logic preserved.The container specification is unchanged; the method correctly accesses base component fields via embedding.
components/victoriametrics/victoriametrics.go (7)
36-41: LGTM - Clean struct embedding matching etcd pattern.
43-55: LGTM - Standard base component initialization.
57-63: LGTM - Appropriate logging configuration for VictoriaMetrics.
65-150: LGTM - Start method follows established pattern with proper cleanup.The ordering of
WaitForReadybeforeSetTaskensures the component only reports as running after readiness is confirmed. Failure paths correctly clean up both task and container.
152-157: LGTM - Endpoint method correctly uses base state checking.
159-219: Verify: Different readiness failure handling vs etcd.Unlike etcd (which logs a warning and continues if
WaitForReadyfails), victoriametrics returns an error. This makes the component stricter about readiness. Verify this behavioral difference is intentional.
221-260: LGTM - Container creation with appropriate VictoriaMetrics configuration.components/victorialogs/victorialogs.go (7)
36-41: LGTM - Consistent structure with VictoriaMetrics component.
43-55: LGTM - Standard base component initialization.
57-63: LGTM - Consistent callback implementations.
65-150: LGTM - Start method consistent with VictoriaMetrics.
152-157: LGTM - Endpoint method matches VictoriaMetrics pattern.
159-219: LGTM - Restart logic consistent with VictoriaMetrics.Same stricter readiness requirement as VictoriaMetrics (returns error on failure).
221-259: LGTM - Container creation with appropriate VictoriaLogs configuration.
The previous code called task.Wait() but didn't actually wait on the returned channel, so it would return immediately. Now properly select on the status channel with timeout handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @components/base/base.go:
- Around line 421-504: handleRestart may call b.CreateTask(ctx, container) with
a nil b.container; add a nil check for container (the local variable captured
from b.container at the top of handleRestart) before attempting to create the
new task and return a clear error if it's nil (e.g., "no container available for
restart"), so CreateTask is never invoked with a nil container; update any
logging to indicate the missing container and keep references to b.container,
handleRestart, and CreateTask to locate the change.
- Around line 347-379: StartExitMonitor can panic if b.task is nil before
calling task.Wait; add a nil check for b.task (or task after assignment)
immediately after capturing task and before calling task.Wait, and handle it by
logging a debug/error message referencing b.ComponentName, cleaning up/resetting
b.stopMonitor and b.monitorDone under b.stateMu (matching the existing cleanup
path), closing monitorDone, and returning without starting monitorExit; ensure
monitorExit is only invoked when task is non-nil.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
components/base/base.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.go: Always runmake lintbefore committing - runs golangci-lint on the entire codebase
Runmake lint-fixto automatically fix linting issues where possible
Only add comments when they provide valuable context or explain 'why' something is done - avoid redundant comments that restate what the code does
Function/method comments should explain the purpose and any important side effects, not just restate the name
Codebase follows standard Go formatting conventions
Files:
components/base/base.go
🧠 Learnings (3)
📓 Common learnings
Learnt from: evanphx
Repo: mirendev/runtime PR: 59
File: cli/commands/dev.go:80-117
Timestamp: 2025-05-22T21:54:17.159Z
Learning: In the context of etcd component integration, evanphx preferred to keep the original implementation approach for error handling in deferred cleanup functions and variable shadowing patterns.
📚 Learning: 2025-05-29T23:20:55.872Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/clickhouse.go:259-299
Timestamp: 2025-05-29T23:20:55.872Z
Learning: In containerd task management, if task.Kill() fails, it usually indicates fundamental issues (task already dead, permission problems, or containerd connection issues) where subsequent operations like task.Wait() or task.Delete() would likely also fail. Early return after Kill failure is often the appropriate behavior rather than attempting further cleanup operations.
Applied to files:
components/base/base.go
📚 Learning: 2025-09-11T22:31:18.478Z
Learnt from: phinze
Repo: mirendev/runtime PR: 216
File: cli/commands/upgrade_linux.go:0-0
Timestamp: 2025-09-11T22:31:18.478Z
Learning: In cli/commands/upgrade_linux.go, the timeout handler intentionally kills only the child process with cmd.Process.Kill() rather than the entire process group with syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL). This is a deliberate design choice by the maintainers.
Applied to files:
components/base/base.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-binaries (darwin, amd64, macos-latest)
- GitHub Check: lint
- GitHub Check: build-binaries (darwin, arm64, macos-latest)
- GitHub Check: test-e2e
- GitHub Check: test
🔇 Additional comments (8)
components/base/base.go (8)
1-102: LGTM on package structure and configuration types.The dual-mutex pattern (opMu for serializing major operations, stateMu for state access) is a sound design for managing concurrent access. Default policies are reasonable.
104-155: LGTM on state accessors.Thread-safe state management with proper mutex usage. The coupling of
SetTaskwithrunning=trueis a sensible convenience for the typical use pattern.
157-220: LGTM on stop logic.The careful sequencing of capturing monitor channels before releasing the lock, then signaling after, correctly avoids races with
StartExitMonitor.
222-267: LGTM on graceful shutdown escalation.The SIGTERM → wait → SIGKILL → wait pattern with proper timeout handling is well implemented. Per project learnings, the early return after
Killfailure is the appropriate behavior since subsequent operations would likely also fail.
274-307: LGTM on container deletion retry logic.The retry with explicit snapshot leak warning on final failure is good defensive programming for containerd cleanup scenarios.
323-345: LGTM on readiness check.Clean TCP connectivity check with proper context cancellation handling and connection cleanup.
381-419: LGTM on exit monitoring loop.The loop correctly handles the stop signal, exit events, and obtaining a new exit channel after restart. The nil task check after restart (line 408) properly guards against continuing when restart fails.
309-321: LGTM on cleanup helper.
CleanupExistingContainerproperly handles both task and container cleanup with appropriate error handling for restart scenarios.
- StartExitMonitor: Check task is not nil before calling Wait - handleRestart: Check container is not nil before calling CreateTask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
components/base/base.go (1)
452-456: Consider guarding against exponential backoff overflow.The calculation
1 << (restartCount-1)on line 453 could theoretically overflow ifMaxRestartsis set to a very high value (e.g., 64). While the defaultMaxRestartsis 5 (making this safe) andBackoffMaxcaps the result, you could add a safeguard:♻️ Proposed safeguard
// Calculate backoff -backoff := policy.BackoffBase * time.Duration(1<<(restartCount-1)) +shift := restartCount - 1 +if shift > 30 { + shift = 30 +} +backoff := policy.BackoffBase * time.Duration(1<<shift) if backoff > policy.BackoffMax { backoff = policy.BackoffMax }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
components/base/base.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.go: Always runmake lintbefore committing - runs golangci-lint on the entire codebase
Runmake lint-fixto automatically fix linting issues where possible
Only add comments when they provide valuable context or explain 'why' something is done - avoid redundant comments that restate what the code does
Function/method comments should explain the purpose and any important side effects, not just restate the name
Codebase follows standard Go formatting conventions
Files:
components/base/base.go
🧠 Learnings (3)
📓 Common learnings
Learnt from: evanphx
Repo: mirendev/runtime PR: 59
File: cli/commands/dev.go:80-117
Timestamp: 2025-05-22T21:54:17.159Z
Learning: In the context of etcd component integration, evanphx preferred to keep the original implementation approach for error handling in deferred cleanup functions and variable shadowing patterns.
📚 Learning: 2025-05-29T23:20:55.872Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/clickhouse.go:259-299
Timestamp: 2025-05-29T23:20:55.872Z
Learning: In containerd task management, if task.Kill() fails, it usually indicates fundamental issues (task already dead, permission problems, or containerd connection issues) where subsequent operations like task.Wait() or task.Delete() would likely also fail. Early return after Kill failure is often the appropriate behavior rather than attempting further cleanup operations.
Applied to files:
components/base/base.go
📚 Learning: 2025-09-11T22:31:18.478Z
Learnt from: phinze
Repo: mirendev/runtime PR: 216
File: cli/commands/upgrade_linux.go:0-0
Timestamp: 2025-09-11T22:31:18.478Z
Learning: In cli/commands/upgrade_linux.go, the timeout handler intentionally kills only the child process with cmd.Process.Kill() rather than the entire process group with syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL). This is a deliberate design choice by the maintainers.
Applied to files:
components/base/base.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-binaries (darwin, arm64, macos-latest)
- GitHub Check: lint
- GitHub Check: test-e2e
- GitHub Check: test
🔇 Additional comments (2)
components/base/base.go (2)
157-267: Well-designed shutdown and cleanup logic.The concurrency handling is correct:
- Clearing monitor channels under lock before closing them prevents races with new
StartExitMonitorcalls- Graceful shutdown escalation (SIGTERM → SIGKILL) with appropriate timeouts
- Early return after SIGTERM failure aligns with best practices (subsequent operations would likely fail anyway)
- Best-effort task deletion even after Wait errors is appropriate for cleanup
347-426: Exit monitoring logic is robust.The nil checks for
task(line 360) and after restart (line 415) prevent panics as mentioned in the PR objectives. The monitor lifecycle is correctly managed:
- Idempotency check prevents duplicate monitors
- Channels are captured under lock before release
- Error paths properly clean up state
- The restart loop correctly obtains new exit channels
- Add IfRunning() helper for thread-safe endpoint access, ensuring running state and port reads are atomic - Add context cancellation check in restart backoff wait to respect shutdown requests during backoff delays - Fix etcd ReadyConfig to use time.Duration constants instead of raw nanoseconds for consistency with the rest of the codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
components/base/base.go (3)
128-134: SetTask has a subtle side effect that may cause confusion.
SetTaskboth sets the task and marksrunning = true. This couples two distinct operations and could lead to incorrect state if a caller sets a task without intending to mark the component as running.Consider either:
- Documenting this behavior prominently, or
- Separating the concerns into two calls
Since this is a private-facing API within the component framework and the pattern is consistent across usage, this is acceptable but worth noting.
287-310: Consider respecting context cancellation during retry loop.The retry loop uses
time.Sleepwhich doesn't respect context cancellation. If the parent context is cancelled during cleanup, the function will continue retrying for up to ~6 seconds. This is likely acceptable for cleanup operations but worth noting.Optional: Add context check
for attempt := 1; attempt <= maxRetries; attempt++ { + select { + case <-ctx.Done(): + b.Log.Warn(b.ComponentName+" container deletion cancelled", "error", ctx.Err()) + return + default: + } + deleteCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
491-499: Task deletion error is silently ignored.The
task.Deletecall on line 494 ignores any error. While this is often acceptable for cleanup (as noted in learnings about containerd), consider logging at debug level for troubleshooting.Optional: Log deletion error at debug level
// Clean up old task if task != nil { deleteCtx, cancel := context.WithTimeout(ctx, 10*time.Second) - task.Delete(deleteCtx) + if _, err := task.Delete(deleteCtx); err != nil { + b.Log.Debug("failed to delete old task during restart", "error", err) + } cancel()components/victoriametrics/victoriametrics.go (1)
136-143: SetTask is called after WaitForReady, which differs from etcd.In
Start,SetTaskis called at line 143 afterWaitForReady(line 137). However, in etcd,SetTaskis called at line 164 beforeWaitForReady(line 168). This inconsistency means:
- In victoriametrics,
IsRunning()returns false during readiness checks- The base component's
GetReadyPortcallback might behave unexpectedlyConsider aligning the order with etcd for consistency.
Align with etcd's ordering
+ c.SetTask(task) + // Wait for VictoriaMetrics to be ready if err := c.WaitForReady(ctx, "localhost", config.HTTPPort); err != nil { + c.SetRunning(false) task.Delete(ctx) container.Delete(ctx, containerd.WithSnapshotCleanup) return err } - c.SetTask(task) c.Log.Info("victoriametrics server started", "container_id", container.ID(), "http_port", config.HTTPPort)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
components/base/base.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.gocomponents/victoriametrics/victoriametrics.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.go: Always runmake lintbefore committing - runs golangci-lint on the entire codebase
Runmake lint-fixto automatically fix linting issues where possible
Only add comments when they provide valuable context or explain 'why' something is done - avoid redundant comments that restate what the code does
Function/method comments should explain the purpose and any important side effects, not just restate the name
Codebase follows standard Go formatting conventions
Files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.gocomponents/base/base.go
🧠 Learnings (7)
📓 Common learnings
Learnt from: evanphx
Repo: mirendev/runtime PR: 59
File: cli/commands/dev.go:80-117
Timestamp: 2025-05-22T21:54:17.159Z
Learning: In the context of etcd component integration, evanphx preferred to keep the original implementation approach for error handling in deferred cleanup functions and variable shadowing patterns.
📚 Learning: 2025-05-30T21:01:16.000Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/integration_test.go:13-16
Timestamp: 2025-05-30T21:01:16.000Z
Learning: The containerd v2 client should be imported using the path "github.com/containerd/containerd/v2/client", not from the root module path. The "/client" suffix is required for the v2 client import.
Applied to files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.go
📚 Learning: 2025-05-30T21:01:16.000Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/integration_test.go:13-16
Timestamp: 2025-05-30T21:01:16.000Z
Learning: The containerd v2 client import path is "github.com/containerd/containerd/v2/client" with the "/client" suffix required. This is the official import path for containerd 2.x client library, not the root module path.
Applied to files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.go
📚 Learning: 2026-01-07T23:46:12.875Z
Learnt from: CR
Repo: mirendev/runtime PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-07T23:46:12.875Z
Learning: Project uses iso for containerized development with all dependencies (Go 1.25, containerd, buildkit, etc.) - defined in `.iso/Dockerfile`
Applied to files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.go
📚 Learning: 2025-05-29T23:20:55.872Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/clickhouse.go:259-299
Timestamp: 2025-05-29T23:20:55.872Z
Learning: In containerd task management, if task.Kill() fails, it usually indicates fundamental issues (task already dead, permission problems, or containerd connection issues) where subsequent operations like task.Wait() or task.Delete() would likely also fail. Early return after Kill failure is often the appropriate behavior rather than attempting further cleanup operations.
Applied to files:
components/victoriametrics/victoriametrics.gocomponents/etcd/etcd.gocomponents/victorialogs/victorialogs.gocomponents/base/base.go
📚 Learning: 2025-05-22T21:54:17.159Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 59
File: cli/commands/dev.go:80-117
Timestamp: 2025-05-22T21:54:17.159Z
Learning: In the context of etcd component integration, evanphx preferred to keep the original implementation approach for error handling in deferred cleanup functions and variable shadowing patterns.
Applied to files:
components/etcd/etcd.go
📚 Learning: 2025-09-11T22:31:18.478Z
Learnt from: phinze
Repo: mirendev/runtime PR: 216
File: cli/commands/upgrade_linux.go:0-0
Timestamp: 2025-09-11T22:31:18.478Z
Learning: In cli/commands/upgrade_linux.go, the timeout handler intentionally kills only the child process with cmd.Process.Kill() rather than the entire process group with syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL). This is a deliberate design choice by the maintainers.
Applied to files:
components/base/base.go
🧬 Code graph analysis (2)
components/etcd/etcd.go (2)
components/base/base.go (2)
BaseComponent(62-89)NewBaseComponent(92-102)pkg/slogout/slogout.go (3)
WithLogger(296-307)WithJSONParsing(45-49)WithMaxLevel(59-64)
components/victorialogs/victorialogs.go (3)
components/base/base.go (2)
BaseComponent(62-89)NewBaseComponent(92-102)pkg/serverconfig/config.gen.go (1)
VictoriaLogsConfig(422-427)discovery/network.go (1)
HTTPEndpoint(16-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-binaries (darwin, arm64, macos-latest)
- GitHub Check: build-binaries (darwin, amd64, macos-latest)
- GitHub Check: test-e2e
- GitHub Check: test
- GitHub Check: lint
🔇 Additional comments (22)
components/base/base.go (8)
20-52: Well-structured configuration with sensible defaults.The RestartPolicy and ReadyCheckConfig provide good flexibility with reasonable defaults (5 max restarts, exponential backoff up to 60s, 30 readiness attempts).
61-102: Clean BaseComponent design with proper separation of concerns.The dual-mutex approach (opMu for operations, stateMu for state) is appropriate for this use case. The constructor initializes defaults correctly.
179-233: Stop logic correctly handles synchronization with exit monitor.The pattern of capturing channels before releasing the lock, then signaling and waiting, correctly prevents races with
StartExitMonitor. The multiple lock/unlock cycles avoid holding locks during blocking operations like container deletion.
235-280: Graceful shutdown with proper escalation is well-implemented.The SIGTERM → SIGKILL escalation with appropriate timeouts follows best practices. Using
context.Background()for the SIGKILL path ensures cleanup can complete even if the parent context is cancelled. Based on learnings, the early return afterKillfailure is appropriate since it indicates fundamental issues.
336-358: Readiness check implementation is solid.Proper use of
net.JoinHostPortfor IPv6 compatibility, context cancellation check between attempts, and connection cleanup on success.
360-399: Exit monitor startup with proper idempotency and error handling.The check for existing monitor prevents duplicate goroutines. The pattern of capturing channels before releasing the lock correctly handles races with
stopInternal.
401-439: Exit monitoring loop correctly handles restarts and error conditions.The loop properly gets a new exit channel after successful restart and has clean exit paths for all failure conditions.
520-526: Readiness failure after restart is logged but not treated as failure.If
WaitForReadyfails after a restart, it only logs a warning and reports success. This means the restart is counted as successful even though the component may not be functioning. Consider whether this should trigger another restart attempt or be treated as a restart failure.The current behavior may be acceptable since transient readiness failures can self-recover, but it's worth verifying this aligns with the intended behavior.
components/etcd/etcd.go (5)
45-68: Clean integration with BaseComponent.The component properly embeds the base component and wires the callbacks. The custom ReadyConfig for etcd (1s timeouts vs default 2s) is appropriately specific to etcd's characteristics.
70-77: Callback implementations are straightforward.The
createTaskuses structured JSON logging with level capping, andgetReadyPortcorrectly returns the client port for readiness checks.
79-176: Start method properly delegates to base component.The flow correctly:
- Uses
LockOp/UnlockOpfor operation serialization- Checks
IsRunning()for idempotency- Uses
SetContainer/SetTaskfor state management- Delegates readiness and exit monitoring to base
178-188: Endpoint methods use IfRunning for thread-safe access.Using
IfRunningensures the port is only accessed when the component is running, returning empty string otherwise. This is consistent with the PR objective of improving thread safety.
190-258: restartExistingContainer correctly uses base component methods.The restart logic properly:
- Sets container/task state via base methods
- Uses
WaitForReadyfor readiness checks- Starts exit monitoring via
StartExitMonitor- Handles various restart scenarios (running, stopped, dead task)
components/victoriametrics/victoriametrics.go (4)
36-55: Clean integration matching etcd pattern.The component correctly embeds BaseComponent and wires callbacks.
152-156: HTTPEndpoint uses IfRunning correctly.Consistent with etcd's endpoint methods.
158-218: restartExistingContainer has same ordering issue as Start.Lines 211 calls
SetTaskafterWaitForReady(line 206), which is inconsistent with etcd. However, both victoriametrics and victorialogs follow this pattern, so this might be intentional for non-etcd components.Verify if the ordering difference is intentional:
- etcd: SetTask → WaitForReady
- victoriametrics/victorialogs: WaitForReady → SetTask
57-63: VictoriaMetrics is a metrics storage system and does not output JSON logs like etcd does. JSON parsing options are specific to etcd's log format, not a general pattern—buildkit and victorialogs similarly don't use these options. No changes needed.Likely an incorrect or invalid review comment.
components/victorialogs/victorialogs.go (5)
36-55: Clean integration matching the pattern established in other components.Constructor properly creates BaseComponent and wires callbacks.
57-63: Consistent with victoriametrics createTask implementation.No slogout options used, same as victoriametrics.
65-150: Start method follows victoriametrics pattern.Same SetTask/WaitForReady ordering as victoriametrics. The refactor suggestion from victoriametrics applies here as well for consistency with etcd.
152-156: HTTPEndpoint correctly uses IfRunning.Consistent implementation across all three components.
158-218: restartExistingContainer follows same pattern as victoriametrics.The implementation is consistent with victoriametrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all 3 of these components have the same restart policy?
Right now all 3 are using DefaultRestartPolicy() but I think we would want the restart policy for etcd to be more aggressive than VictoriaLogs and VictoriaMetrics since etcd is more crucial? Happy to discuss! 😄
|
@teresamychu really good call out, let me adjust that. |
- Change default restart policy to never give up (MaxRestarts=0) - Add AggressiveRestartPolicy for critical components with shorter backoff times (500ms base, 10s max vs 2s base, 60s max) - Apply aggressive restart policy to etcd since the system cannot run without it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
components/base/base.go (1)
502-510: Log error if task deletion fails during cleanup.Line 505 ignores the error from
task.Delete(). While the old task should be in a stopped state and deletion should succeed, logging failures would help diagnose potential resource leaks.📝 Log deletion errors
// Clean up old task if task != nil { deleteCtx, cancel := context.WithTimeout(ctx, 10*time.Second) - task.Delete(deleteCtx) + if _, err := task.Delete(deleteCtx); err != nil { + b.Log.Warn("failed to delete old task during restart", "error", err) + } cancel() b.stateMu.Lock() b.task = nil b.stateMu.Unlock() }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
components/base/base.gocomponents/etcd/etcd.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.go: Always runmake lintbefore committing - runs golangci-lint on the entire codebase
Runmake lint-fixto automatically fix linting issues where possible
Only add comments when they provide valuable context or explain 'why' something is done - avoid redundant comments that restate what the code does
Function/method comments should explain the purpose and any important side effects, not just restate the name
Codebase follows standard Go formatting conventions
Files:
components/etcd/etcd.gocomponents/base/base.go
🧠 Learnings (7)
📓 Common learnings
Learnt from: evanphx
Repo: mirendev/runtime PR: 59
File: cli/commands/dev.go:80-117
Timestamp: 2025-05-22T21:54:17.159Z
Learning: In the context of etcd component integration, evanphx preferred to keep the original implementation approach for error handling in deferred cleanup functions and variable shadowing patterns.
Learnt from: phinze
Repo: mirendev/runtime PR: 173
File: cli/commands/commands_linux.go:29-31
Timestamp: 2025-08-12T20:24:39.993Z
Learning: In the MIR-309 CLI rename from runtime to miren, the containerd namespace should also be changed from "runtime" to "miren" to align with the overall rebranding effort, along with other component names like etcd and clickhouse containers.
📚 Learning: 2025-05-22T21:54:17.159Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 59
File: cli/commands/dev.go:80-117
Timestamp: 2025-05-22T21:54:17.159Z
Learning: In the context of etcd component integration, evanphx preferred to keep the original implementation approach for error handling in deferred cleanup functions and variable shadowing patterns.
Applied to files:
components/etcd/etcd.go
📚 Learning: 2025-05-30T21:01:16.000Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/integration_test.go:13-16
Timestamp: 2025-05-30T21:01:16.000Z
Learning: The containerd v2 client should be imported using the path "github.com/containerd/containerd/v2/client", not from the root module path. The "/client" suffix is required for the v2 client import.
Applied to files:
components/etcd/etcd.go
📚 Learning: 2025-05-30T21:01:16.000Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/integration_test.go:13-16
Timestamp: 2025-05-30T21:01:16.000Z
Learning: The containerd v2 client import path is "github.com/containerd/containerd/v2/client" with the "/client" suffix required. This is the official import path for containerd 2.x client library, not the root module path.
Applied to files:
components/etcd/etcd.go
📚 Learning: 2026-01-07T23:46:12.875Z
Learnt from: CR
Repo: mirendev/runtime PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-07T23:46:12.875Z
Learning: Project uses iso for containerized development with all dependencies (Go 1.25, containerd, buildkit, etc.) - defined in `.iso/Dockerfile`
Applied to files:
components/etcd/etcd.go
📚 Learning: 2025-05-29T23:20:55.872Z
Learnt from: evanphx
Repo: mirendev/runtime PR: 73
File: components/clickhouse/clickhouse.go:259-299
Timestamp: 2025-05-29T23:20:55.872Z
Learning: In containerd task management, if task.Kill() fails, it usually indicates fundamental issues (task already dead, permission problems, or containerd connection issues) where subsequent operations like task.Wait() or task.Delete() would likely also fail. Early return after Kill failure is often the appropriate behavior rather than attempting further cleanup operations.
Applied to files:
components/etcd/etcd.gocomponents/base/base.go
📚 Learning: 2025-09-11T22:31:18.478Z
Learnt from: phinze
Repo: mirendev/runtime PR: 216
File: cli/commands/upgrade_linux.go:0-0
Timestamp: 2025-09-11T22:31:18.478Z
Learning: In cli/commands/upgrade_linux.go, the timeout handler intentionally kills only the child process with cmd.Process.Kill() rather than the entire process group with syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL). This is a deliberate design choice by the maintainers.
Applied to files:
components/base/base.go
🧬 Code graph analysis (1)
components/etcd/etcd.go (2)
components/base/base.go (4)
BaseComponent(73-100)NewBaseComponent(103-113)RestartPolicy(21-26)AggressiveRestartPolicy(40-47)pkg/slogout/slogout.go (3)
WithLogger(296-307)WithJSONParsing(45-49)WithMaxLevel(59-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test
- GitHub Check: test-e2e
- GitHub Check: build-binaries (darwin, arm64, macos-latest)
- GitHub Check: lint
- GitHub Check: build-binaries (darwin, amd64, macos-latest)
🔇 Additional comments (11)
components/base/base.go (6)
20-63: LGTM! Well-designed restart and readiness configuration.The separation between default and aggressive restart policies is excellent. The aggressive policy's shorter backoff (500ms base vs 2s, 10s max vs 60s) is appropriate for critical infrastructure like etcd.
72-113: LGTM! Clean separation of concerns with dual mutex design.The separation between
opMu(for serializing Start/Stop operations) andstateMu(for protecting state fields) is excellent design that prevents lock contention and enables fine-grained synchronization.
181-244: LGTM! Correct shutdown sequence with proper lock ordering.The pattern of capturing monitor channels under lock, then releasing the lock before closing them and waiting is correct—this avoids deadlock while preventing races with new
StartExitMonitorcalls. The cleanup order (monitor → task → container → running flag) is appropriate.
246-296: LGTM! Correct graceful shutdown with SIGTERM/SIGKILL escalation.The early return after
Kill()failure (Line 252-254) is appropriate. Based on learnings, iftask.Kill()fails, it indicates fundamental issues where subsequent operations would likely also fail.
371-450: LGTM! Robust exit monitoring with proper cleanup.The channel capture pattern (lines 392-394) before releasing the lock is crucial to prevent races with
stopInternal. Thedefer close(monitorDone)ensures the channel is always closed, even when returning early from error paths.
298-369: LGTM! Robust container cleanup and readiness checking.The retry logic in
deleteContainerWithRetry(3 attempts with 2s delay) handles transient failures well. The TCP-based readiness check is a standard, reliable approach.components/etcd/etcd.go (5)
45-70: LGTM! Clean refactoring with appropriate aggressive restart policy.The aggressive restart policy (Line 56) is well-suited for etcd as a critical infrastructure component. The faster readiness checks (1s timeout and interval vs the default 2s) are appropriate for etcd's typically quick startup.
72-79: LGTM! Callbacks correctly implement component-specific behavior.The
createTaskcallback properly configures structured logging with JSON parsing for etcd's output, andgetReadyPortcorrectly returns the client port for readiness checks.
81-178: LGTM! Start method correctly refactored to use BaseComponent.The migration from direct field access and manual synchronization to
BaseComponentmethods (LockOp,IsRunning,SetContainer,SetTask,WaitForReady,StartExitMonitor) is clean and correct. The use ofe.createTaskat Line 153 properly invokes the callback.
180-190: LGTM! Endpoint methods use thread-safe IfRunning pattern.Using
IfRunningensures these methods return empty strings when the component is stopped, preventing use of stale port values. Callers should handle the empty string case appropriately.
192-260: LGTM! Existing container restart logic correctly migrated.All three restart paths (existing running task, existing stopped task, new task) consistently use
SetTask,WaitForReady, andStartExitMonitorfrom the base component. The callback pattern is applied uniformly.
| func DefaultRestartPolicy() RestartPolicy { | ||
| return RestartPolicy{ | ||
| MaxRestarts: 5, | ||
| MaxRestarts: 0, // Never give up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥

Summary
components/basepackageBaseComponentThe
BaseComponentprovides shared functionality for:Each component embeds
BaseComponentand provides callbacks for component-specific behavior (task creation, ready port).Test plan