-
Notifications
You must be signed in to change notification settings - Fork 7
Feature/rule engine redesign #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a comprehensive rule engine redesign to support multiple execution contexts beyond Kubernetes: host-level monitoring and standalone container monitoring. The changes introduce a context detection system that automatically identifies the source of events (K8s, Host, or Standalone) and routes them appropriately through the rule engine.
Changes:
- Introduces a context detection framework with registry and detector system to classify events by source (K8s, Host, Standalone)
- Adds configuration options for host and standalone monitoring with context-specific rule filtering
- Enables host monitoring by creating a virtual container representing the host's mount namespace
- Extends cloud metadata fetching with fallback support for DigitalOcean, GCP, and Azure metadata services
Reviewed changes
Copilot reviewed 37 out of 37 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/contextdetection/types.go | Defines core types for context detection (EventSourceContext, ContextInfo, ContextDetector interfaces) |
| pkg/contextdetection/registry.go | Implements mount namespace registry for mapping containers to their execution contexts |
| pkg/contextdetection/detectors/*.go | Implements detector chain (K8s, Host, Standalone) with priority-based context detection |
| pkg/rulemanager/rule_manager.go | Integrates context detection into rule evaluation pipeline with context-aware rule filtering |
| pkg/rulemanager/rulepolicy.go | Adds RuleAppliesToContext function for tag-based context filtering |
| pkg/rulemanager/ruleadapters/creator.go | Enriches rule failures with context-specific fields (hostname, container details) |
| pkg/rulemanager/types/failure.go | Extends GenericRuleFailure with SourceContext field |
| pkg/ebpf/events/enriched_event.go | Adds SourceContext and MountNamespaceID fields to track event origins |
| pkg/utils/datasource_event.go | Adds nil-safe handling for field accessors and fallback path resolution |
| pkg/containerwatcher/v2/container_watcher_collection.go | Creates virtual host container for host-level event processing |
| pkg/containerwatcher/v2/containercallback.go | Skips shared data setup for virtual host container |
| pkg/containerwatcher/v2/tracers/*.go | Adds "operator.LocalManager.host" parameter to all tracers for host monitoring |
| pkg/cloudmetadata/metadata.go | Implements cloud provider fallback detection for DigitalOcean, GCP, and Azure |
| pkg/config/config.go | Adds HostMonitoringEnabled and StandaloneMonitoringEnabled configuration flags |
| configuration/config.json | Sets default values for host and standalone monitoring flags |
| cmd/main.go | Initializes MntnsRegistry and passes to RuleManager |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 37 out of 37 changed files in this pull request and generated 14 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 38 out of 38 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
308f941 to
20ace02
Compare
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a multi-context detection and enrichment system (Kubernetes, Host, Standalone), a mount-namespace registry, context-aware rule evaluation and gating, host-as-container virtual host monitoring, tracer local-manager parameters, cloud metadata IMDS fallback, and related config flags and event accessor resilience. Changes
Sequence DiagramsequenceDiagram
participant CW as ContainerWatcher
participant DM as DetectorManager
participant MR as MntnsRegistry
participant RM as RuleManager
participant RE as RuleEvaluator
CW->>DM: DetectContext(container)
DM->>DM: K8sDetector.Detect()
alt K8s found
DM-->>CW: K8sContextInfo
else Host found
DM-->>CW: HostContextInfo
else Standalone found
DM-->>CW: StandaloneContextInfo
end
CW->>MR: Register(mntnsID, ContextInfo)
MR-->>CW: OK
Note over CW,RM: Event reported with MountNamespaceID
CW->>RM: ReportEnrichedEvent(event)
RM->>MR: Lookup(event.MountNamespaceID)
MR-->>RM: ContextInfo
RM->>RM: enrichEventWithContext()
RM->>RE: Evaluate(event, context)
RE->>RE: RuleAppliesToContext() & monitoring enabled check
RE-->>RM: Matched rule results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/RULE_ENGINE_MULTI_CONTEXT_REDESIGN.md`:
- Around line 29-31: The Go code blocks contain hard tab characters (e.g., lines
defining Kubernetes/Host/Standalone as EventSourceContext and other blocks at
the ranges noted); replace all hard tabs inside those fenced code blocks with
spaces so markdownlint MD010 is satisfied. Locate the Go snippets that define
EventSourceContext constants and similar blocks (symbols like
EventSourceContext, Kubernetes, Host, Standalone and other constant/grouped
lines at the noted ranges) and convert indentation and separators to use spaces
instead of \t throughout each code fence.
In `@pkg/containerwatcher/v2/container_watcher_collection.go`:
- Around line 109-128: The AddContainer return value is not checked: after
calling cw.containerCollection.AddContainer(virtualHostContainer) you must
handle failure by checking its error, logging an explicit error (use
logger.L().Error with helpers.Error(err) and include virtualHostContainer
identifiers like Mntns and Runtime.ContainerPID), and not invoking
cw.containerCallback or logging the success Info when AddContainer failed; if
the enclosing function returns an error propagate/return that error, otherwise
disable/skipping host-monitoring steps (e.g., clear cw.cfg.HostMonitoringEnabled
or set a failure flag) so the system does not silently degrade. Ensure you
modify the block around GetHostAsContainer, cw.containerCollection.AddContainer,
cw.containerCallback and the logger calls accordingly.
♻️ Duplicate comments (4)
pkg/cloudmetadata/metadata.go (2)
137-140: DigitalOcean validation heuristic may produce false positives.The string containment checks could match unrelated content. A more robust approach would be to attempt fetching a required endpoint (e.g.,
id) directly and validate based on success.
166-169: Empty check is fragile and needs updating if fields are added.Consider adding a helper method like
IsEmpty()onCloudMetadataor using reflection for maintainability.pkg/contextdetection/registry.go (1)
31-34: UnusedhostMntnsfield and missingSetHostMntnsmethod.This was already flagged in a previous review. The
hostMntnsfield is declared but never used, and the referencedSetHostMntnsmethod in the comment doesn't exist.pkg/rulemanager/rule_manager.go (1)
167-174: Rule filtering for non-K8s contexts is incomplete (acknowledged TODO).As noted in past reviews, non-K8s contexts load all rules via
CreateAllRules()without filtering. This could impact performance in environments with many rules. The TODO acknowledges this limitation.Consider adding a debug/warning log when using unfiltered rules to help operators understand potential performance implications.
🧹 Nitpick comments (13)
pkg/cloudmetadata/metadata.go (1)
92-117: Consider reusinghttp.Clientfor connection pooling.Creating a new
http.Clienton each call tofetchHTTPMetadataprevents connection reuse and adds overhead.http.Clientis safe for concurrent use and should be reused.♻️ Suggested refactor
Define a package-level client:
const ( azureApiVersion = "2021-12-13" metadataTimeout = 2 * time.Second ) +var metadataHTTPClient = &http.Client{ + Timeout: metadataTimeout, +}Then use it in the function:
func fetchHTTPMetadata(ctx context.Context, url string, headers map[string]string) (string, error) { - client := &http.Client{ - Timeout: metadataTimeout, - } req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) if err != nil { return "", err } for k, v := range headers { req.Header.Set(k, v) } - resp, err := client.Do(req) + resp, err := metadataHTTPClient.Do(req)pkg/containerwatcher/v2/containercallback.go (1)
63-67: Make virtual-host detection explicit and verify the PID sentinel.
ContainerPID == 1is a magic number; if any legitimate container can report PID 1 on a supported runtime/hostPID setup, shared data won’t be set and downstream enrichment may break. Please verify the sentinel across runtimes and consider centralizing it (const/helper or explicit host-context flag) for clarity.♻️ Suggested refactor
+const virtualHostPID = 1 ... - if notif.Container.Runtime.ContainerPID == 1 { + if notif.Container.Runtime.ContainerPID == virtualHostPID {pkg/containerwatcher/v2/tracers/iouring.go (1)
85-88: LGTM!Consistent with the PR-wide pattern for enabling local-manager aware gadget execution.
Consider extracting the common parameter key and value into a package-level constant or helper function to reduce duplication across all tracers:
♻️ Optional: Extract common params
// In a shared location (e.g., tracers/common.go) const LocalManagerHostParam = "operator.LocalManager.host" func DefaultGadgetParams() map[string]string { return map[string]string{ LocalManagerHostParam: "true", } }This would centralize the parameter definition and make future updates easier across all tracers.
pkg/containerwatcher/v2/tracers/open.go (1)
72-75: Refactor config initialization pattern to avoid implicit ordering dependency.The
cfgfield is assigned inIsEnabled()(line 104) but used inStart()(line 73), creating a fragile ordering dependency. WhileTracerManagerensuresIsEnabled()is called beforeStart(), this pattern is inconsistent with all other tracers and breaks ifStart()is called directly. Store the config during tracer initialization inNewOpenTracer()instead of inIsEnabled().pkg/contextdetection/detectors/k8s.go (2)
10-15:WorkloadandWorkloadUIDfields are declared but never populated.The
K8sContextInfostruct declaresWorkloadandWorkloadUIDfields, but theDetect()method (lines 42-45) only populatesNamespaceandPodName. If these fields are intended for future use, consider adding a TODO comment. Otherwise, remove them to avoid confusion.
25-31: Thenamefield is unused.The
namefield is initialized inNewK8sDetector()but never accessed. Consider removing it or utilizing it for logging/debugging purposes.pkg/contextdetection/registry.go (1)
73-80: Minor: Same check-then-act pattern inUnregister.Similar to
Register, theHas()+Delete()pattern isn't atomic. However, this is less impactful here since deleting a non-existent key is typically idempotent. The main effect is that the debug log might be skipped in a race scenario.pkg/rulemanager/ruleadapters/creator.go (2)
337-343: Consider logging or handling the hostname error.When
os.Hostname()fails, the error is silently ignored andk8sDetails.NodeNameremains empty. While this may be acceptable for non-critical metadata, consider logging the error at debug level for troubleshooting purposes.Proposed improvement
case contextdetection.Host: ruleFailure.SetSourceContext(contextdetection.Host) // For Host context, use NodeName to store the hostname hostname, err := os.Hostname() - if err == nil { + if err != nil { + logger.L().Debug("setContextSpecificFields - failed to get hostname", helpers.Error(err)) + } else { k8sDetails.NodeName = hostname }
345-361: Redundant field population for Standalone context.For Standalone context, this code sets
ImageandContainerNamefromtriggerEvent(lines 354-360), butsetRuntimeAlertK8sDetails(called earlier at line 88) already populates these same fields fromtriggerEventat lines 264-286. The checks for empty strings prevent overwriting, but these assignments will never execute since the fields are already set.This redundancy could cause confusion about the source of truth for these fields. Consider either:
- Removing the redundant assignments here, or
- Adding a comment explaining this is intentional fallback logic
pkg/contextdetection/detectors/manager.go (1)
21-34: Detector ordering is hardcoded despite Priority() methods.The detectors have
Priority()methods (K8s=0, Host=1, Standalone=2), but the ordering inNewDetectorManageris hardcoded rather than sorted by priority. This creates potential inconsistency if priorities change.Consider either:
- Sorting detectors by
Priority()after initialization, or- Removing unused
Priority()methods from detectorsOption 1: Sort by priority
func NewDetectorManager(registry contextdetection.Registry) *DetectorManager { dm := &DetectorManager{ registry: registry, detectors: []contextdetection.ContextDetector{ NewK8sDetector(), NewHostDetector(), NewStandaloneDetector(), }, } + + // Sort detectors by priority (lower = higher priority) + sort.Slice(dm.detectors, func(i, j int) bool { + return dm.detectors[i].Priority() < dm.detectors[j].Priority() + }) return dm }pkg/contextdetection/detectors/standalone.go (1)
17-25: WorkloadID logic can be simplified.The current logic is correct but verbose. The past review comment suggested a cleaner approach:
Simplified WorkloadID implementation
func (s *StandaloneContextInfo) WorkloadID() string { - if s.ContainerID == "" { - return s.ContainerName - } - if s.ContainerName != "" { - return s.ContainerName + ":" + s.ContainerID - } - return s.ContainerID + switch { + case s.ContainerName == "": + return s.ContainerID + case s.ContainerID == "": + return s.ContainerName + default: + return s.ContainerName + ":" + s.ContainerID + } }pkg/rulemanager/rule_manager.go (1)
187-190: Silent early exit when monitoring is disabled.When monitoring is disabled for a context, the function returns silently without logging. Consider adding a debug log to help operators understand why events are being dropped.
Add debug logging for disabled monitoring
// Early exit if monitoring is disabled for this context - skip rule evaluation if !rm.isMonitoringEnabledForContext(enrichedEvent.SourceContext) { + logger.L().Debug("RuleManager - monitoring disabled for context, skipping rule evaluation", + helpers.String("context", string(enrichedEvent.SourceContext.Context()))) return }pkg/utils/datasource_event.go (1)
86-171: Return an error fromSeton missing fields.
Setcurrently returns nil, making writes to absent fields appear successful. All other error-returning methods onnullFieldAccessor(e.g.,PutUint8,PutString,Rename) returnerrFieldNotFound, soSetshould too for consistency and to avoid silent failures.♻️ Proposed change
-func (n *nullFieldAccessor) Set(data datasource.Data, value []byte) error { return nil } +func (n *nullFieldAccessor) Set(data datasource.Data, value []byte) error { return errFieldNotFound }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (38)
cmd/main.goconfiguration/config.jsondocs/RULE_ENGINE_MULTI_CONTEXT_REDESIGN.mdpkg/cloudmetadata/metadata.gopkg/config/config.gopkg/containerwatcher/v2/container_watcher_collection.gopkg/containerwatcher/v2/containercallback.gopkg/containerwatcher/v2/tracers/bpf.gopkg/containerwatcher/v2/tracers/capabilities.gopkg/containerwatcher/v2/tracers/dns.gopkg/containerwatcher/v2/tracers/exec.gopkg/containerwatcher/v2/tracers/exit.gopkg/containerwatcher/v2/tracers/fork.gopkg/containerwatcher/v2/tracers/hardlink.gopkg/containerwatcher/v2/tracers/http.gopkg/containerwatcher/v2/tracers/iouring.gopkg/containerwatcher/v2/tracers/kmod.gopkg/containerwatcher/v2/tracers/network.gopkg/containerwatcher/v2/tracers/open.gopkg/containerwatcher/v2/tracers/ptrace.gopkg/containerwatcher/v2/tracers/randomx.gopkg/containerwatcher/v2/tracers/ssh.gopkg/containerwatcher/v2/tracers/symlink.gopkg/containerwatcher/v2/tracers/syscall.gopkg/containerwatcher/v2/tracers/unshare.gopkg/contextdetection/detectors/host.gopkg/contextdetection/detectors/k8s.gopkg/contextdetection/detectors/manager.gopkg/contextdetection/detectors/standalone.gopkg/contextdetection/registry.gopkg/contextdetection/types.gopkg/ebpf/events/enriched_event.gopkg/rulemanager/containercallbacks.gopkg/rulemanager/rule_manager.gopkg/rulemanager/ruleadapters/creator.gopkg/rulemanager/rulepolicy.gopkg/rulemanager/types/failure.gopkg/utils/datasource_event.go
🧰 Additional context used
🧬 Code graph analysis (12)
pkg/rulemanager/rulepolicy.go (2)
pkg/rulemanager/types/v1/types.go (1)
Rule(20-35)pkg/contextdetection/types.go (3)
ContextInfo(15-18)EventSourceContext(7-7)Kubernetes(10-10)
pkg/rulemanager/types/failure.go (1)
pkg/contextdetection/types.go (1)
EventSourceContext(7-7)
pkg/contextdetection/detectors/standalone.go (1)
pkg/contextdetection/types.go (3)
EventSourceContext(7-7)Standalone(12-12)ContextInfo(15-18)
pkg/ebpf/events/enriched_event.go (2)
pkg/utils/events.go (1)
K8sEvent(21-29)pkg/contextdetection/types.go (1)
ContextInfo(15-18)
pkg/rulemanager/ruleadapters/creator.go (5)
pkg/objectcache/applicationprofilecache_interface.go (1)
ApplicationProfileCache(9-14)pkg/objectcache/networkneighborhoodcache_interface.go (1)
NetworkNeighborhoodCache(8-12)pkg/rulemanager/types/failure.go (1)
GenericRuleFailure(15-28)pkg/ebpf/events/enriched_event.go (1)
EnrichedEvent(21-36)pkg/contextdetection/types.go (3)
Kubernetes(10-10)Host(11-11)Standalone(12-12)
pkg/containerwatcher/v2/container_watcher_collection.go (1)
pkg/objectcache/shared_container_data.go (1)
Container(27-27)
pkg/contextdetection/registry.go (1)
pkg/contextdetection/types.go (1)
ContextInfo(15-18)
pkg/rulemanager/containercallbacks.go (1)
pkg/contextdetection/detectors/standalone.go (1)
StandaloneContextInfo(8-11)
pkg/contextdetection/detectors/k8s.go (1)
pkg/contextdetection/types.go (3)
EventSourceContext(7-7)Kubernetes(10-10)ContextInfo(15-18)
pkg/rulemanager/rule_manager.go (9)
pkg/contextdetection/registry.go (1)
Registry(17-28)pkg/contextdetection/detectors/manager.go (2)
DetectorManager(16-19)NewDetectorManager(23-34)pkg/rulemanager/ruleadapters/creator.go (1)
NewRuleFailureCreator(49-57)pkg/rulemanager/rulepolicy.go (1)
RuleAppliesToContext(38-59)pkg/contextdetection/types.go (4)
Kubernetes(10-10)ContextInfo(15-18)Host(11-11)Standalone(12-12)pkg/utils/k8s.go (1)
CreateK8sPodID(20-22)pkg/rulemanager/types/v1/types.go (1)
Rule(20-35)pkg/ebpf/events/enriched_event.go (1)
EnrichedEvent(21-36)pkg/utils/events.go (1)
EnrichEvent(31-49)
pkg/contextdetection/detectors/host.go (1)
pkg/contextdetection/types.go (3)
EventSourceContext(7-7)Host(11-11)ContextInfo(15-18)
pkg/utils/datasource_event.go (1)
pkg/utils/events.go (1)
EventType(190-190)
🪛 markdownlint-cli2 (0.18.1)
docs/RULE_ENGINE_MULTI_CONTEXT_REDESIGN.md
29-29: Hard tabs
Column: 1
(MD010, no-hard-tabs)
30-30: Hard tabs
Column: 1
(MD010, no-hard-tabs)
31-31: Hard tabs
Column: 1
(MD010, no-hard-tabs)
44-44: Hard tabs
Column: 1
(MD010, no-hard-tabs)
45-45: Hard tabs
Column: 1
(MD010, no-hard-tabs)
59-59: Hard tabs
Column: 1
(MD010, no-hard-tabs)
60-60: Hard tabs
Column: 1
(MD010, no-hard-tabs)
61-61: Hard tabs
Column: 1
(MD010, no-hard-tabs)
71-71: Hard tabs
Column: 1
(MD010, no-hard-tabs)
72-72: Hard tabs
Column: 1
(MD010, no-hard-tabs)
74-74: Hard tabs
Column: 1
(MD010, no-hard-tabs)
75-75: Hard tabs
Column: 1
(MD010, no-hard-tabs)
76-76: Hard tabs
Column: 1
(MD010, no-hard-tabs)
77-77: Hard tabs
Column: 1
(MD010, no-hard-tabs)
78-78: Hard tabs
Column: 1
(MD010, no-hard-tabs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-and-push-image
- GitHub Check: pr-created / test / Create cross-platform build
🔇 Additional comments (52)
pkg/cloudmetadata/metadata.go (5)
19-22: LGTM!The constants are well-defined. The Azure API version has been updated to
2021-12-13, and the 2-second timeout is appropriate for metadata service calls.
61-90: LGTM!The fallback mechanism is well-structured and extensible. The error wrapping with
%wproperly preserves the original error context for debugging.
119-125: LGTM!The helper correctly extracts the last path segment and handles the empty string edge case.
174-197: LGTM!The GCP metadata fetcher correctly uses the required
Metadata-Flavor: Googleheader and validates instance presence viamachine-type. The use ofgetLastPathPartproperly normalizes the full resource paths.
199-224: LGTM!The Azure metadata fetcher correctly uses the required
Metadata: trueheader and theformat=textparameter. The API version2021-12-13is appropriately recent.pkg/containerwatcher/v2/tracers/symlink.go (1)
71-74: LGTM!The addition of the
operator.LocalManager.hostparameter aligns with the PR-wide pattern to enable local-manager aware gadget execution. The inline comment clearly documents the purpose.pkg/containerwatcher/v2/tracers/syscall.go (1)
68-72: LGTM!Clean integration of the
operator.LocalManager.hostparameter into the existing params map while preserving the eBPF map fetch configuration.pkg/containerwatcher/v2/tracers/kmod.go (1)
71-74: LGTM!Consistent with the PR-wide pattern for enabling local-manager aware gadget execution.
pkg/containerwatcher/v2/tracers/hardlink.go (1)
71-74: LGTM!Consistent with the PR-wide pattern for enabling local-manager aware gadget execution.
pkg/containerwatcher/v2/tracers/exec.go (1)
71-75: LGTM!The addition of the
operator.LocalManager.hostparameter alongside the existingoperator.oci.ebpf.pathsparameter is consistent with the broader PR pattern to enable local-manager-aware gadget execution. The inline comment clearly documents the purpose.pkg/containerwatcher/v2/tracers/network.go (1)
85-89: LGTM!The params map correctly combines the existing
operator.oci.annotateconfiguration with the newoperator.LocalManager.hostflag, maintaining backward compatibility while enabling local-manager-aware execution.pkg/containerwatcher/v2/tracers/bpf.go (1)
71-74: LGTM!The addition of the
operator.LocalManager.hostparameter is consistent with the other tracer implementations in this PR, enabling local-manager-aware gadget execution.pkg/containerwatcher/v2/tracers/randomx.go (1)
68-71: LGTM!The addition of the
operator.LocalManager.hostparameter is consistent with the other tracer implementations in this PR.pkg/containerwatcher/v2/tracers/capabilities.go (1)
70-75: Good addition of local-manager parameter.This keeps RunGadget params non-nil and aligns with local-manager handling expectations.
pkg/containerwatcher/v2/tracers/http.go (1)
85-88: LGTM: params map added for local-manager host.pkg/containerwatcher/v2/tracers/ssh.go (1)
72-75: Looks good — local-manager param added.pkg/containerwatcher/v2/tracers/exit.go (1)
67-70: LGTM: local-manager params added.pkg/containerwatcher/v2/tracers/dns.go (1)
76-79: Good: keeps existing params and adds local-manager host.pkg/containerwatcher/v2/tracers/ptrace.go (1)
67-70: LGTM!The params map addition follows the consistent pattern established across other tracers in this PR, enabling local-manager-aware gadget execution without errors when container-collection is nil.
pkg/containerwatcher/v2/tracers/fork.go (1)
67-70: LGTM!Consistent with the local-manager host parameter pattern applied across tracers.
pkg/containerwatcher/v2/tracers/unshare.go (1)
71-74: LGTM!Consistent with the local-manager host parameter pattern applied across tracers.
pkg/containerwatcher/v2/container_watcher_collection.go (1)
175-194: LGTM!The function is clean and well-documented. It correctly retrieves the mount namespace for PID 1 and creates a minimal synthetic container representation suitable for host-level event processing.
pkg/contextdetection/types.go (1)
1-23: LGTM!Clean and minimal interface definitions that establish a well-structured foundation for the context detection system. The
ContextDetectorinterface withPriority()enables flexible, priority-based detection ordering, and theContextInfoabstraction cleanly separates context data from detection logic.pkg/contextdetection/detectors/k8s.go (2)
33-48: LGTM!The
Detectmethod has proper nil checks and validates that bothPodNameandNamespaceare non-empty before returning a valid context. The defensive programming approach is appropriate.
50-52: LGTM!Priority 0 correctly establishes K8s detection as the highest priority detector.
pkg/contextdetection/registry.go (2)
67-71: LGTM!The
Lookupmethod is a clean delegation to the underlyingSafeMap.Load().
51-58: Race condition betweenHas()andSet()inRegister(), and betweenHas()andDelete()inUnregister().The check-then-act patterns are not atomic. In concurrent scenarios, two goroutines could both pass the
Has()check and proceed toSet()(orDelete()), potentially causing data races or silent overwrites. WhileSafeMapoperations are individually thread-safe, the compound operations are not.Unfortunately,
goradd/maps.SafeMapdoes not provide atomic check-and-set methods likeLoadOrStore. To fix this, either:
- Protect the compound operations with an external
sync.Mutex, or- Accept this limitation if registration/unregistration only occurs during initialization when concurrency is not a concern (and document this assumption)
configuration/config.json (1)
18-19: LGTM!The new configuration flags
hostMonitoringEnabledandstandaloneMonitoringEnabledare correctly added with booleanfalsedefaults, matching the corresponding Go struct fields.pkg/config/config.go (2)
69-70: LGTM!The new configuration fields are properly declared with correct mapstructure tags that match the JSON configuration keys.
184-185: Config flags are properly utilized in the codebase.Both
HostMonitoringEnabledandStandaloneMonitoringEnabledare consumed as expected:
HostMonitoringEnabledgates behavior inpkg/containerwatcher/v2/container_watcher_collection.go:110- Both flags are accessed in
pkg/rulemanager/rule_manager.go:287-289The defaults are correctly set and the flags are actively used to control behavior.
cmd/main.go (2)
32-32: LGTM!The contextdetection package import is correctly added to support the new mount namespace registry functionality.
294-297: LGTM! Proper integration of the mount namespace registry.The
mntnsRegistryis correctly instantiated within the runtime detection block (guarded bycfg.EnableRuntimeDetection) and properly passed toCreateRuleManager, where it is immediately used to initialize theDetectorManager. This ensures the registry is only created when runtime detection is enabled, while the else branch correctly creates mocks instead.pkg/ebpf/events/enriched_event.go (1)
7-7: Context metadata fields are well-integrated.The new
SourceContextandMountNamespaceIDfields align with the enrichment flow, and their zero values are safe until populated.Also applies to: 22-35
pkg/rulemanager/containercallbacks.go (2)
66-84: Context detection + registry registration looks solid.The fallback to standalone and conditional registry registration keep enrichment resilient without blocking container tracking.
100-102: Registry cleanup on removal is a good safeguard.Unregistering the mount namespace avoids stale context mappings after container teardown.
pkg/rulemanager/rulepolicy.go (1)
37-58: Context tag filtering logic is clear and backward compatible.Defaulting to Kubernetes when no context tags are present matches the intended behavior and keeps existing rules safe.
pkg/rulemanager/types/failure.go (1)
27-28: No action required: RuleFailure interface expansion is complete.GenericRuleFailure is the only concrete implementation of the RuleFailure interface in the codebase, and it already includes both GetSourceContext() (line 172) and SetSourceContext() (line 176) methods. The mock implementations (mockExporter, MockExporter, ExporterMock) in test files only accept RuleFailure as a parameter type in SendRuleAlert methods; they do not implement the interface themselves. No additional updates are needed.
pkg/rulemanager/ruleadapters/creator.go (2)
322-327: Nil context defaults to Kubernetes - LGTM.The backward-compatible default to Kubernetes context when
SourceContextis nil is appropriate for maintaining existing behavior.
114-118: Nil check pattern is consistent with other setters.The early return pattern for nil
triggerEventaligns with the same pattern insetCloudServices,setBaseRuntimeAlert, andsetRuntimeAlertK8sDetails. This addresses the nil pointer dereference concern raised in past reviews.pkg/contextdetection/detectors/manager.go (1)
39-67: DetectContext implementation is sound.The method correctly:
- Validates nil input with appropriate error
- Iterates detectors in priority order
- Returns first matching context
- Falls back to Standalone with warning log
The design appropriately handles the case where no detector explicitly matches.
pkg/contextdetection/detectors/host.go (2)
30-36: Hostname error silently ignored but has fallback.If
os.Hostname()fails,hostNamewill be empty, butWorkloadID()(lines 18-23) falls back to"host"whenHostNameis empty. This is acceptable behavior, though you may want to log the error for debugging.
38-53: Host detection logic correctly identifies virtual host container.The logic properly:
- Excludes containers with K8s metadata (lines 43-45)
- Identifies host context via
ContainerPid() == 1(line 47)This addresses the past review concern about misclassifying standalone containers as host context, since standalone containers will have regular PIDs, not PID 1.
pkg/contextdetection/detectors/standalone.go (1)
35-46: Standalone detector correctly acts as catch-all.The
Detectmethod unconditionally returnstruefor non-nil containers, which is the intended behavior as the fallback detector after K8s and Host detectors fail to match. This design is consistent with theDetectorManagerfallback logic.pkg/rulemanager/rule_manager.go (4)
144-165: Context determination logic is sound.The logic correctly determines the context type and retrieves workload details:
- K8s context: uses pod/namespace to look up WLID
- Host/Standalone: uses
SourceContext.WorkloadID()The nil check at line 158 is defensive but technically redundant since lines 156 branch on
!isK8sContextwhich already impliesSourceContext != nil(from line 144).
259-276: Context enrichment logic is correct.The
enrichEventWithContextmethod properly:
- Extracts mount namespace ID via type assertion
- Sets
MountNamespaceIDon the event- Looks up context from registry and populates
SourceContext- Includes appropriate debug logging
278-296: Monitoring config check handles all contexts correctly.The
isMonitoringEnabledForContextmethod properly:
- Defaults to
truefor nil context (backward compatible)- Checks
HostMonitoringEnabledfor Host context- Checks
StandaloneMonitoringEnabledfor Standalone context- Always enables Kubernetes monitoring
The default case returning
true(line 294) is a safe fallback for any future context types.
197-205: Context-aware profile dependency check is appropriate.The logic correctly skips profile dependency checks for non-K8s contexts since application profiles are K8s-specific. The condition
isK8sContext && !profileExists && rule.ProfileDependency == armotypes.Requiredproperly gates this check.pkg/utils/datasource_event.go (5)
3-21: No review notes for the import additions.
173-187: Nice guard for missing fields in the accessor cache.This prevents nil dereference when a field is absent while preserving caching behavior.
256-265: LGTM for accessor-basedcommlookup.
430-437: Good fallback tofnamewhenfpathis empty.
576-590: Accessor-based PID lookup looks good.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| Kubernetes EventSourceContext = "kubernetes" | ||
| Host EventSourceContext = "host" | ||
| Standalone EventSourceContext = "standalone" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix markdownlint MD010 hard tabs in code blocks.
Markdownlint flags hard tabs in the Go snippets. Replace tabs with spaces to keep doc linting clean.
🧹 Example fix
- Kubernetes EventSourceContext = "kubernetes"
+ Kubernetes EventSourceContext = "kubernetes"Also applies to: 44-45, 59-61, 71-78
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
29-29: Hard tabs
Column: 1
(MD010, no-hard-tabs)
30-30: Hard tabs
Column: 1
(MD010, no-hard-tabs)
31-31: Hard tabs
Column: 1
(MD010, no-hard-tabs)
🤖 Prompt for AI Agents
In `@docs/RULE_ENGINE_MULTI_CONTEXT_REDESIGN.md` around lines 29 - 31, The Go code
blocks contain hard tab characters (e.g., lines defining
Kubernetes/Host/Standalone as EventSourceContext and other blocks at the ranges
noted); replace all hard tabs inside those fenced code blocks with spaces so
markdownlint MD010 is satisfied. Locate the Go snippets that define
EventSourceContext constants and similar blocks (symbols like
EventSourceContext, Kubernetes, Host, Standalone and other constant/grouped
lines at the noted ranges) and convert indentation and separators to use spaces
instead of \t throughout each code fence.
| // Create virtual host container if host monitoring enabled | ||
| if cw.cfg.HostMonitoringEnabled { | ||
| virtualHostContainer, err := GetHostAsContainer() | ||
| if err != nil { | ||
| logger.L().Warning("ContainerManager - failed to create virtual host container", | ||
| helpers.Error(err)) | ||
| } else { | ||
| cw.containerCollection.AddContainer(virtualHostContainer) | ||
|
|
||
| // Manually trigger callbacks to ensure context detection runs | ||
| cw.containerCallback(containercollection.PubSubEvent{ | ||
| Type: containercollection.EventTypeAddContainer, | ||
| Container: virtualHostContainer, | ||
| }) | ||
|
|
||
| logger.L().Info("ContainerManager - virtual host container created", | ||
| helpers.String("mntns", fmt.Sprintf("%d", virtualHostContainer.Mntns)), | ||
| helpers.String("pid", fmt.Sprintf("%d", virtualHostContainer.Runtime.ContainerPID))) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unchecked error from AddContainer and silent degradation.
Two issues:
-
Line 116: The return value from
cw.containerCollection.AddContainer(virtualHostContainer)is not checked. If adding the container fails, the code proceeds to trigger callbacks for a container that wasn't actually added. -
As noted in a previous review, when host monitoring is explicitly enabled via configuration but fails, only a warning is logged. This silent degradation may leave operators unaware that their requested feature is not functioning.
🔧 Proposed fix
// Create virtual host container if host monitoring enabled
if cw.cfg.HostMonitoringEnabled {
virtualHostContainer, err := GetHostAsContainer()
if err != nil {
- logger.L().Warning("ContainerManager - failed to create virtual host container",
+ logger.L().Error("ContainerManager - failed to create virtual host container",
helpers.Error(err))
- } else {
- cw.containerCollection.AddContainer(virtualHostContainer)
+ return fmt.Errorf("creating virtual host container: %w", err)
+ }
+
+ if err := cw.containerCollection.AddContainer(virtualHostContainer); err != nil {
+ logger.L().Error("ContainerManager - failed to add virtual host container to collection",
+ helpers.Error(err))
+ return fmt.Errorf("adding virtual host container to collection: %w", err)
+ }
- // Manually trigger callbacks to ensure context detection runs
- cw.containerCallback(containercollection.PubSubEvent{
- Type: containercollection.EventTypeAddContainer,
- Container: virtualHostContainer,
- })
+ // Manually trigger callbacks to ensure context detection runs
+ cw.containerCallback(containercollection.PubSubEvent{
+ Type: containercollection.EventTypeAddContainer,
+ Container: virtualHostContainer,
+ })
- logger.L().Info("ContainerManager - virtual host container created",
- helpers.String("mntns", fmt.Sprintf("%d", virtualHostContainer.Mntns)),
- helpers.String("pid", fmt.Sprintf("%d", virtualHostContainer.Runtime.ContainerPID)))
- }
+ logger.L().Info("ContainerManager - virtual host container created",
+ helpers.String("mntns", fmt.Sprintf("%d", virtualHostContainer.Mntns)),
+ helpers.String("pid", fmt.Sprintf("%d", virtualHostContainer.Runtime.ContainerPID)))
}🤖 Prompt for AI Agents
In `@pkg/containerwatcher/v2/container_watcher_collection.go` around lines 109 -
128, The AddContainer return value is not checked: after calling
cw.containerCollection.AddContainer(virtualHostContainer) you must handle
failure by checking its error, logging an explicit error (use logger.L().Error
with helpers.Error(err) and include virtualHostContainer identifiers like Mntns
and Runtime.ContainerPID), and not invoking cw.containerCallback or logging the
success Info when AddContainer failed; if the enclosing function returns an
error propagate/return that error, otherwise disable/skipping host-monitoring
steps (e.g., clear cw.cfg.HostMonitoringEnabled or set a failure flag) so the
system does not silently degrade. Ensure you modify the block around
GetHostAsContainer, cw.containerCollection.AddContainer, cw.containerCallback
and the logger calls accordingly.
# Conflicts: # pkg/rulemanager/types/failure.go
Update SourceContext values to use consistent terminology: - "kubernetes" for Kubernetes context - "host" for Host context - "standalone" for Standalone/Docker container context Aligns with contextdetection.Standalone enum naming and improves clarity that this context supports any container runtime, not just Docker.
Signed-off-by: Yakir Oren <yakiroren@gmail.com>
# Conflicts: # pkg/config/config.go
Signed-off-by: Yakir Oren <yakiroren@gmail.com>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
…ategy and timeout settings Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
Signed-off-by: Yakir Oren <yakiroren@gmail.com>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
20ace02 to
f483f66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@pkg/cloudmetadata/metadata.go`:
- Around line 81-89: The current return always wraps only the original IMDS
error (err) even when fallbacks fail for other reasons; update the loop over
fallbacks (the fallbacks slice and fb.fetch calls) to capture the last non-nil
fallback error (e.g., lastFallbackErr) or aggregate errors as you iterate, then
when returning use fmt.Errorf to include both the IMDS error and the fallback
failure context (e.g., "failed to get cloud metadata from IMDS or fallbacks:
imds: %v; fallback: %w" or similar), so the final error preserves the IMDS error
plus the final fallback error from fb.fetch for easier troubleshooting.
- Around line 93-96: The http.Client in fetchHTTPMetadata currently uses the
default transport which may proxy metadata requests; create and assign a custom
transport with Proxy set to nil (e.g., transport := &http.Transport{Proxy: nil})
and set client.Transport = transport so metadata traffic bypasses any
HTTP_PROXY/NO_PROXY environment settings; keep the existing Timeout on the
http.Client and preserve any other transport fields if needed.
In `@pkg/utils/datasource_event.go`:
- Line 94: The nullFieldAccessor.Set method currently returns nil which hides
attempts to write missing fields; change the implementation of
nullFieldAccessor.Set to return the same sentinel error used by the Put* methods
(errFieldNotFound) so callers receive a consistent error when writing to a
non-existent field; update the Set function (symbol: nullFieldAccessor.Set) to
return errFieldNotFound instead of nil so behavior matches the Put* methods.
♻️ Duplicate comments (3)
pkg/utils/datasource_event.go (1)
430-442: Previous nil-check concern is now addressed.The fallback to
fnamewhenfpathis empty is now safe with themissingFieldAccessorimplementation. Both accessor calls will return empty strings (rather than panicking) if the fields don't exist, and the fallback logic correctly triesfnameas an alternative.pkg/contextdetection/registry.go (1)
31-34: UnusedhostMntnsfield.The
hostMntnsfield is declared but never used. The comment on line 40 references aSetHostMntnsmethod that doesn't exist. Either implement the intended functionality or remove the unused field.docs/RULE_ENGINE_MULTI_CONTEXT_REDESIGN.md (1)
29-31: Hard tabs in code blocks trigger markdownlint MD010.The Go code snippets use hard tabs for indentation which violates markdownlint rules. Replace tabs with spaces for consistency.
🧹 Nitpick comments (8)
pkg/containerwatcher/v2/tracers/hardlink.go (1)
71-74: LGTM!Consistent with the PR-wide pattern for local manager support.
Consider extracting shared params to reduce duplication.
The same params map is duplicated across all tracers. You could define a package-level constant or helper function to reduce repetition:
// In a shared location (e.g., tracers/common.go) var defaultGadgetParams = map[string]string{ "operator.LocalManager.host": "true", }This is a minor improvement and can be deferred if preferred.
pkg/containerwatcher/v2/containercallback.go (1)
63-67: Consider adding symmetric handling in RemoveContainer path.The early return for
ContainerPID == 1correctly skips K8s-dependent shared data setup for the virtual host container. However, theRemoveContainerpath at Line 78 will still callDeleteSharedContainerDatafor virtual host containers, attempting to delete data that was never set.While this is likely benign (a no-op delete), adding a symmetric check would improve clarity and consistency:
♻️ Suggested improvement
case containercollection.EventTypeRemoveContainer: logger.L().Debug("ContainerWatcher.containerCallback - remove container event received", helpers.String("container ID", notif.Container.Runtime.ContainerID), helpers.String("k8s workload", k8sContainerID), helpers.String("ContainerImageDigest", notif.Container.Runtime.ContainerImageDigest), helpers.String("ContainerImageName", notif.Container.Runtime.ContainerImageName)) cw.metrics.ReportContainerStop() + // Skip shared data cleanup for virtual host container (was never set) + if notif.Container.Runtime.ContainerPID == 1 { + return + } cw.objectCache.K8sObjectCache().DeleteSharedContainerData(notif.Container.Runtime.ContainerID)pkg/contextdetection/detectors/host.go (1)
30-36: Consider logging when hostname retrieval fails.The error from
os.Hostname()is silently discarded. While the fallback to empty string withWorkloadID()returning"host"is reasonable, logging the error would aid debugging in environments where hostname resolution fails unexpectedly.♻️ Suggested improvement
func NewHostDetector() *HostDetector { - hostName, _ := os.Hostname() + hostName, err := os.Hostname() + if err != nil { + logger.L().Warning("HostDetector - failed to get hostname, using fallback", + helpers.Error(err)) + } return &HostDetector{ name: "HostDetector", hostName: hostName, } }pkg/contextdetection/registry.go (1)
51-58: Potential race condition between Has() and Set().The check-then-act pattern (
Has()followed bySet()) has a TOCTOU (time-of-check to time-of-use) window. Two concurrent goroutines could both pass theHas()check and proceed toSet(), resulting in a missed warning log and potential data overwrite.Consider using an atomic operation if the SafeMap supports it, or accept that duplicate registrations may occasionally slip through the warning.
♻️ Option: Use SetIfAbsent pattern if available
If
maps.SafeMapsupportsLoadOrStoreor similar atomic operations, use that instead:// Pseudo-code - depends on SafeMap API if existed := r.entries.SetIfAbsent(mntns, info); existed { logger.L().Warning("MntnsRegistry - mount namespace already registered, skipping", ...) return nil }Otherwise, document that the warning log is best-effort and duplicates may occasionally be accepted.
pkg/contextdetection/detectors/standalone.go (1)
17-25: WorkloadID logic is functionally correct but could be cleaner.The current logic works but has some redundancy. After the check on line 18,
ContainerIDis guaranteed non-empty, so the condition on line 21 (ContainerName != "") could be the primary check.♻️ Optional simplification
func (s *StandaloneContextInfo) WorkloadID() string { - if s.ContainerID == "" { - return s.ContainerName - } - if s.ContainerName != "" { - return s.ContainerName + ":" + s.ContainerID + if s.ContainerName == "" { + return s.ContainerID } - return s.ContainerID + if s.ContainerID == "" { + return s.ContainerName + } + return s.ContainerName + ":" + s.ContainerID }pkg/contextdetection/detectors/manager.go (2)
16-19: Unusedregistryfield in DetectorManager.The
registryfield is stored but never used withinDetectorManager. The actual registration is handled by the caller incontainercallbacks.go. Consider removing this field if it's not needed, or document why it's stored for future use.
59-66: Unreachable fallback code given current detector implementations.Since
StandaloneDetector.Detect()always returnstruefor non-nil containers, the warning and fallback on lines 60-66 are currently unreachable. Theforloop will always find a match via theStandaloneDetector.This defensive code isn't harmful, but the warning message could be misleading during debugging. Consider either:
- Removing this fallback since
StandaloneDetectorguarantees a match, or- Keeping it as a safety net but updating the comment to explain it's a defensive measure
pkg/rulemanager/rule_manager.go (1)
167-174: TODO: Rule filtering for non-K8s contexts remains unimplemented.For non-K8s contexts,
CreateAllRules()loads all rules without context-based filtering. WhileRuleAppliesToContext(line 198) filters per-rule based on "context:" tags (and defaults most rules to Kubernetes-only), pre-filtering at retrieval time would avoid loading unnecessary rules.Consider adding a method like
CreateRulesByContext(context)to the rule creator interface that filters rules during retrieval. Alternatively, leverage the existingCreateRulesByTags()method to filter by context tags where rules define them. This becomes important as the rule set grows.
| for _, fb := range fallbacks { | ||
| if meta, ferr := fb.fetch(ctx); ferr == nil && meta != nil { | ||
| logger.L().Info(fmt.Sprintf("retrieved cloud metadata from %s metadata service", fb.name)) | ||
| return meta, nil | ||
| } | ||
| } | ||
|
|
||
| // Wrap the underlying error with additional context so logs make it clearer why metadata is missing. | ||
| return nil, fmt.Errorf("failed to get cloud metadata from IMDS or fallbacks: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve fallback failure context in the returned error.
Right now the final error always wraps the original IMDS failure, even if fallbacks fail for different reasons. Capturing the last fallback error (or aggregating) will make troubleshooting much clearer.
🛠️ Proposed fix (keep the IMDS error but include fallback failure)
@@
- for _, fb := range fallbacks {
- if meta, ferr := fb.fetch(ctx); ferr == nil && meta != nil {
- logger.L().Info(fmt.Sprintf("retrieved cloud metadata from %s metadata service", fb.name))
- return meta, nil
- }
- }
-
- // Wrap the underlying error with additional context so logs make it clearer why metadata is missing.
- return nil, fmt.Errorf("failed to get cloud metadata from IMDS or fallbacks: %w", err)
+ var lastErr error
+ for _, fb := range fallbacks {
+ if meta, ferr := fb.fetch(ctx); ferr == nil && meta != nil {
+ logger.L().Info(fmt.Sprintf("retrieved cloud metadata from %s metadata service", fb.name))
+ return meta, nil
+ } else if ferr != nil {
+ lastErr = fmt.Errorf("%s: %w", fb.name, ferr)
+ }
+ }
+
+ if lastErr != nil {
+ return nil, fmt.Errorf("failed to get cloud metadata from IMDS (%v) or fallbacks (%w)", err, lastErr)
+ }
+ return nil, fmt.Errorf("failed to get cloud metadata from IMDS or fallbacks: %w", err)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for _, fb := range fallbacks { | |
| if meta, ferr := fb.fetch(ctx); ferr == nil && meta != nil { | |
| logger.L().Info(fmt.Sprintf("retrieved cloud metadata from %s metadata service", fb.name)) | |
| return meta, nil | |
| } | |
| } | |
| // Wrap the underlying error with additional context so logs make it clearer why metadata is missing. | |
| return nil, fmt.Errorf("failed to get cloud metadata from IMDS or fallbacks: %w", err) | |
| var lastErr error | |
| for _, fb := range fallbacks { | |
| if meta, ferr := fb.fetch(ctx); ferr == nil && meta != nil { | |
| logger.L().Info(fmt.Sprintf("retrieved cloud metadata from %s metadata service", fb.name)) | |
| return meta, nil | |
| } else if ferr != nil { | |
| lastErr = fmt.Errorf("%s: %w", fb.name, ferr) | |
| } | |
| } | |
| if lastErr != nil { | |
| return nil, fmt.Errorf("failed to get cloud metadata from IMDS (%v) or fallbacks (%w)", err, lastErr) | |
| } | |
| return nil, fmt.Errorf("failed to get cloud metadata from IMDS or fallbacks: %w", err) |
🤖 Prompt for AI Agents
In `@pkg/cloudmetadata/metadata.go` around lines 81 - 89, The current return
always wraps only the original IMDS error (err) even when fallbacks fail for
other reasons; update the loop over fallbacks (the fallbacks slice and fb.fetch
calls) to capture the last non-nil fallback error (e.g., lastFallbackErr) or
aggregate errors as you iterate, then when returning use fmt.Errorf to include
both the IMDS error and the fallback failure context (e.g., "failed to get cloud
metadata from IMDS or fallbacks: imds: %v; fallback: %w" or similar), so the
final error preserves the IMDS error plus the final fallback error from fb.fetch
for easier troubleshooting.
| func fetchHTTPMetadata(ctx context.Context, url string, headers map[string]string) (string, error) { | ||
| client := &http.Client{ | ||
| Timeout: metadataTimeout, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Read the metadata.go file to understand the full context
cat -n pkg/cloudmetadata/metadata.go | head -120Repository: kubescape/node-agent
Length of output: 4780
🌐 Web query:
Go http.Client default proxy behavior HTTP_PROXY environment variable
💡 Result:
Short answer
- The default http.Client uses http.DefaultTransport, whose Transport.Proxy is set to http.ProxyFromEnvironment — so by default Go will use HTTP_PROXY / HTTPS_PROXY and NO_PROXY (also lowercase) from the environment. [1][2]
Key details
- Proxy selection: requests use the proxy var matching the request scheme (HTTPS_PROXY takes precedence for https); NO_PROXY excludes hosts. [1][2]
- Special cases: ProxyFromEnvironment treats localhost/loopback as non-proxied by default. [2]
- Security: do not trust HTTP_PROXY in CGI environments without mitigation (the httpoxy issue — incoming Proxy header can set HTTP_PROXY). Sanitize environment in CGI contexts. [3]
If you want code to rely on environment proxies explicitly, set Transport.Proxy = http.ProxyFromEnvironment on your client’s Transport. [2]
Sources
- net/http documentation — ProxyFromEnvironment. [1]
- net/http DefaultTransport (Proxy: ProxyFromEnvironment) and golang.org/x/net/http/httpproxy docs. [2]
- httpoxy / Go vulnerability guidance (GO-2022-0761). [3]
🌐 Web query:
Go http.Transport Proxy nil disable proxies metadata
💡 Result:
Yes — setting Transport.Proxy to nil disables proxies. By default net/http.Transport.Proxy is http.ProxyFromEnvironment; if Proxy is nil or returns a nil *url.URL, no proxy is used. Example:
- Default (uses environment):
tr := &http.Transport{} // ProxyFromEnvironment is used by default when nil on a default Client - Explicitly disable proxies:
tr := &http.Transport{ Proxy: nil }
Or explicitly return nil:
tr := &http.Transport{ Proxy: func(*http.Request) (*url.URL, error) { return nil, nil } }
References: net/http docs and Transport source. [1][2]
Disable proxies for metadata traffic.
http.Client without a custom transport defaults to http.ProxyFromEnvironment, which respects HTTP_PROXY and NO_PROXY environment variables. Metadata service endpoints (169.254.169.254 for AWS, 169.254.170.2 for Azure, etc.) are not automatically in NO_PROXY by default. If a proxy is configured without explicitly excluding metadata hosts, metadata traffic—including credentials—may be proxied through an untrusted proxy, risking credential leakage or service failure. Set transport.Proxy = nil to disable proxying for these requests.
🔒 Proposed fix (disable proxies for metadata calls)
@@
- client := &http.Client{
- Timeout: metadataTimeout,
- }
+ transport := http.DefaultTransport.(*http.Transport).Clone()
+ transport.Proxy = nil
+ client := &http.Client{
+ Timeout: metadataTimeout,
+ Transport: transport,
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func fetchHTTPMetadata(ctx context.Context, url string, headers map[string]string) (string, error) { | |
| client := &http.Client{ | |
| Timeout: metadataTimeout, | |
| } | |
| func fetchHTTPMetadata(ctx context.Context, url string, headers map[string]string) (string, error) { | |
| transport := http.DefaultTransport.(*http.Transport).Clone() | |
| transport.Proxy = nil | |
| client := &http.Client{ | |
| Timeout: metadataTimeout, | |
| Transport: transport, | |
| } |
🤖 Prompt for AI Agents
In `@pkg/cloudmetadata/metadata.go` around lines 93 - 96, The http.Client in
fetchHTTPMetadata currently uses the default transport which may proxy metadata
requests; create and assign a custom transport with Proxy set to nil (e.g.,
transport := &http.Transport{Proxy: nil}) and set client.Transport = transport
so metadata traffic bypasses any HTTP_PROXY/NO_PROXY environment settings; keep
the existing Timeout on the http.Client and preserve any other transport fields
if needed.
| func (n *nullFieldAccessor) FullName() string { return "" } | ||
| func (n *nullFieldAccessor) Size() uint32 { return 0 } | ||
| func (n *nullFieldAccessor) Get(data datasource.Data) []byte { return nil } | ||
| func (n *nullFieldAccessor) Set(data datasource.Data, value []byte) error { return nil } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent error return in Set method.
The Set method returns nil while all Put* methods (lines 157-169) return errFieldNotFound. This inconsistency could mask errors when callers attempt to write to a missing field.
Proposed fix
-func (n *nullFieldAccessor) Set(data datasource.Data, value []byte) error { return nil }
+func (n *nullFieldAccessor) Set(data datasource.Data, value []byte) error { return errFieldNotFound }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (n *nullFieldAccessor) Set(data datasource.Data, value []byte) error { return nil } | |
| func (n *nullFieldAccessor) Set(data datasource.Data, value []byte) error { return errFieldNotFound } |
🤖 Prompt for AI Agents
In `@pkg/utils/datasource_event.go` at line 94, The nullFieldAccessor.Set method
currently returns nil which hides attempts to write missing fields; change the
implementation of nullFieldAccessor.Set to return the same sentinel error used
by the Put* methods (errFieldNotFound) so callers receive a consistent error
when writing to a non-existent field; update the Set function (symbol:
nullFieldAccessor.Set) to return errFieldNotFound instead of nil so behavior
matches the Put* methods.
matthyx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well done @YakirOren
Summary by CodeRabbit
New Features
Improvements
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.