-
Notifications
You must be signed in to change notification settings - Fork 7
Replace host sensor with node agent sensing #681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Bezalel Brandwine <bez@softwine.net>
Signed-off-by: Bezalel Brandwine <bez@softwine.net>
|
this is very early draft :) |
Signed-off-by: Bezalel Brandwine <bez@softwine.net>
Signed-off-by: Bezalel Brandwine <bez@softwine.net>
Signed-off-by: Bezalel Brandwine <bez@softwine.net>
Signed-off-by: Bezalel Brandwine <bez@softwine.net>
Signed-off-by: Bezalel Brandwine <bez@softwine.net>
📝 WalkthroughWalkthroughThis pull request introduces a Host Sensor Manager system that periodically collects host system information (OS release, kernel version, security settings, network ports, cloud provider metadata, CNI configuration, kubelet/kube-proxy details, and control plane components) and persists collected data as Kubernetes Custom Resources. The system integrates configuration, multiple sensor implementations, a CRD client, and lifecycle management. Changes
Sequence Diagram(s)sequenceDiagram
participant Main as Main Process
participant HSM as HostSensorManager
participant Sensors as Sensors
participant CRDClient as CRDClient
participant K8s as Kubernetes API
Main->>HSM: NewHostSensorManager(config)
HSM->>HSM: Initialize sensors<br/>(OS, Kernel, Security, Network, etc.)
HSM->>CRDClient: NewCRDClient(nodeName)
CRDClient->>K8s: Build in-cluster config
Main->>HSM: Start(ctx)
activate HSM
HSM->>HSM: runSensing() - initial sense
loop Periodic Sensing
HSM->>Sensors: runSensor(sensor)
activate Sensors
Sensors->>Sensors: Sense()
Sensors-->>HSM: sensor data
deactivate Sensors
alt On Sensor Success
HSM->>CRDClient: CreateOrUpdateHostData(spec)
CRDClient->>K8s: Create/Update CRD
K8s-->>CRDClient: Resource Version
else On Sensor Error
HSM->>CRDClient: UpdateStatus(errorMsg)
CRDClient->>K8s: Patch status
end
HSM->>HSM: Wait for interval
end
Main->>HSM: Stop()
HSM->>HSM: Signal stop, drain in-flight
deactivate HSM
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🤖 Fix all issues with AI agents
In `@pkg/hostsensormanager/crd_client.go`:
- Around line 107-131: The status patch must target the CRD status subresource
if the CRD exposes .status; update the Patch call in UpdateStatus to use the
status subresource by calling
c.dynamicClient.Resource(gvr).SubResource("status").Patch(...) (keep
MergePatchType and the same patchData and PatchOptions). Ensure the call still
uses c.nodeName as the resource name and context ctx; adjust any imports/err
handling as needed.
In `@pkg/hostsensormanager/sensor_cloudprovider.go`:
- Around line 74-80: The loop currently uses defer res.Body.Close() after
client.Do(httpReq), which delays closing until function exit and leaks
resources; change it to close the response body immediately when you are done
with it (for example after checking res.StatusCode), by ensuring you check err
and res != nil, read or discard the body if needed, and call res.Body.Close()
directly instead of deferring; update the logic around client.Do, httpReq, res
and the http.StatusOK check so each response body is promptly closed within the
iteration.
In `@pkg/hostsensormanager/sensor_controlplane.go`:
- Around line 7-21: The Sense() implementation currently omits Etcd and the API
server encryption provider arg: add code to locate the etcd process (using
constant etcdExe), parse its command-line for the --data-dir (etcdDataDirArg)
and set ControlPlaneInfoSpec.EtcdDataDir; likewise, when inspecting the API
server process (apiServerExe) parse its command-line for
--encryption-provider-config (and existing auditPolicyFileArg) and populate
ApiServerInfo.EncryptionProviderConfigFile (and ensure AuditPolicyFile remains
set). Implement these extra parses in the same process-walking logic used for
kube-apiserver/controller-manager/scheduler inside Sense() so the Etcd data
directory and API server encryption file are filled before returning
ControlPlaneInfoSpec.
In `@pkg/hostsensormanager/sensor_kernelvars.go`:
- Around line 68-76: The loop opens files into varFile and defers
varFile.Close() inside the loop (using hVarFileName and varFile) which leaks FDs
across iterations due to batched Readdirnames(100) and recursive traversal;
change the code to close varFile before the next iteration/return by replacing
the defer with an explicit Close call scoped per iteration (e.g., wrap the
per-file processing in an anonymous function that calls varFile.Close() at the
end or call varFile.Close() immediately after use and before continue/return) so
each opened file is closed promptly.
In `@pkg/hostsensormanager/sensor_network.go`:
- Around line 12-14: The current constant tcpListeningState (value 10) is
misleading and is being applied to ICMP where "listening" has no semantic
meaning; rename the constant to listeningState and change the filtering logic in
sensor_network.go so the listeningState filter is applied only to protocols that
support socket states (e.g., TCP and UDP) and skip or treat ICMP entries
separately (do not filter ICMP by listeningState or explicitly exclude ICMP when
checking state). Update all references to tcpListeningState to listeningState
and add a small conditional around the state check that checks the protocol
(e.g., protocol == "tcp" || protocol == "udp") before comparing to
listeningState.
In `@pkg/hostsensormanager/sensor_osrelease.go`:
- Around line 67-76: The loop over etcDir.Readdirnames(100) currently swallows
non-EOF errors and always returns "not found"; update the logic so that
Readdirnames errors are propagated: call etcDir.Readdirnames in a loop and if
err != nil then if err == io.EOF break the loop, else return "", err; continue
scanning filenames for osReleaseFileSuffix (logger.L().Debug uses helpers.String
for filename) and only after exhausting entries return the os-release not found
error referencing hEtcDir. Ensure you use the existing etcDir.Readdirnames,
osReleaseFileSuffix and hEtcDir identifiers and avoid masking errors by
returning the actual error when non-EOF occurs.
In `@pkg/hostsensormanager/sensor_security.go`:
- Around line 7-10: The SELinux constant is pointing at the wrong file; update
seLinuxConfigFileName (currently set to "/etc/selinux/semanage.conf") to
"/etc/selinux/config" and change any code that reads the file (e.g., functions
referencing seLinuxConfigFileName) to parse the "SELINUX=" line and return the
mode (enforcing/permissive/disabled) instead of returning the whole file
contents.
In `@pkg/hostsensormanager/types.go`:
- Around line 134-139: OpenPortsSpec's ICMPPorts field is misleading because
ICMP uses type/code pairs instead of ports; replace or rename it: either (A)
rename ICMPPorts to ICMPConnections to clarify semantics, or (B) better, define
a new struct (e.g., ICMPRecord { Type int `json:"type"`; Code int `json:"code"`;
/* optional meta fields */ }) and change OpenPortsSpec.ICMPPorts from
[]Connection to []ICMPRecord with an appropriate json tag, and update all usages
(constructors, serializers, tests) that reference OpenPortsSpec.ICMPPorts or the
Connection type to use the new field/type (or new name) so code no longer stores
ICMP data in LocalPort/RemotePort fields of Connection.
- Around line 76-229: The CRD types (OsReleaseFile, KernelVersion,
LinuxSecurityHardening, OpenPorts, LinuxKernelVariables, KubeletInfo,
KubeProxyInfo, ControlPlaneInfo, CloudProviderInfo, CNIInfo) lack DeepCopy
methods required by controller-runtime/client-go; add codegen by installing and
running controller-gen (sigs.k8s.io/controller-tools) and annotate each type
with +kubebuilder:object:root=true and +kubebuilder:subresource:status as
appropriate, then run controller-gen object:headerFile=<path> paths=./... to
generate zz_generated.deepcopy.go files, or alternatively implement manual
DeepCopyInto/DeepCopy/DeepCopyObject methods for the listed types (and their
Spec structs) to satisfy kubernetes runtime.Object and deepcopy-gen
requirements; ensure generated files are committed and the build/Makefile
updated to run controller-gen in CI.
🧹 Nitpick comments (12)
pkg/hostsensormanager/sensor_network.go (1)
80-100: Error handling stops on first missing file, potentially skipping valid data.If any file in
pathsListdoesn't exist (e.g.,/proc/net/icmp6on systems without IPv6, or/proc/net/udplite*if the module isn't loaded), the function returns early and skips remaining files. Consider continuing onos.ErrNotExisterrors.♻️ Proposed fix
func (s *OpenPortsSensor) getOpenedPorts(pathsList []string) ([]Connection, error) { res := make([]Connection, 0) for _, p := range pathsList { hPath := hostPath(p) bytesBuf, err := os.ReadFile(hPath) if err != nil { + if os.IsNotExist(err) { + continue // File doesn't exist on this system, skip + } return res, fmt.Errorf("failed to ReadFile(%s): %w", hPath, err) } netCons := procspy.NewProcNet(bytesBuf, tcpListeningState) for c := netCons.Next(); c != nil; c = netCons.Next() { res = append(res, Connection{ Transport: c.Transport, LocalAddress: c.LocalAddress.String(), LocalPort: c.LocalPort, RemoteAddress: c.RemoteAddress.String(), RemotePort: c.RemotePort, }) } } return res, nil }pkg/hostsensormanager/sensor_cloudprovider.go (1)
45-58: Consider AWS IMDSv2 support for broader compatibility.The AWS metadata check uses IMDSv1 (direct GET). Many AWS environments now require IMDSv2, which needs a session token obtained via PUT request first. This could result in false negatives on security-hardened instances.
pkg/config/config.go (1)
101-103: Validate HostSensorInterval to prevent invalid scheduling.Consider guarding against zero/negative values after unmarshal so misconfigurations fail fast rather than causing runtime instability.
Proposed validation
// Validate seccompProfileBackend value if config.SeccompProfileBackend != "" && config.SeccompProfileBackend != SeccompBackendStorage && config.SeccompProfileBackend != SeccompBackendCRD { return Config{}, fmt.Errorf("invalid seccompProfileBackend value: %q (must be %q or %q)", config.SeccompProfileBackend, SeccompBackendStorage, SeccompBackendCRD) } + + if config.HostSensorInterval <= 0 { + return Config{}, fmt.Errorf("invalid hostSensorInterval value: %s (must be > 0)", config.HostSensorInterval) + }pkg/hostsensormanager/sensor_kubeproxy.go (1)
43-46: Consider soft-failing when kube-proxy is absent.On clusters that don’t run kube-proxy, returning an error may cause noisy logs or failed sensing cycles. Please verify the manager’s error handling and, if needed, downgrade “not found” to a non-fatal outcome (empty spec + nil error).
pkg/hostsensormanager/crd_client.go (2)
133-147: Remove unusedtoUnstructuredfunction.This function is defined but never called anywhere in the codebase.
88-91: Remove unnecessary UID assignment on update.The UID is server-assigned and immutable. Setting it on the update object is unnecessary—only
resourceVersionis required for optimistic concurrency.// Update the object with the resource version and other metadata unstructuredObj.SetResourceVersion(existing.GetResourceVersion()) - unstructuredObj.SetUID(existing.GetUID())pkg/hostsensormanager/sensor_cni.go (1)
54-81: Consider extending CNI detection list.The current list covers major CNIs but misses some cloud-provider-specific ones like Azure CNI (
azure-vnet) or GKE's CNI. This could be extended in future iterations.pkg/hostsensormanager/sensor_security.go (2)
43-56: Redundant file operations ingetAppArmorStatus.The code opens the file with
os.Opento check existence, then separately callsreadFileOnHostFileSystemwhich likely opens and reads the file again. Consolidate into a single read operation.Proposed simplification
func (s *LinuxSecurityHardeningSensor) getAppArmorStatus() string { statusStr := "unloaded" - hAppArmorProfilesFileName := hostPath(appArmorProfilesFileName) - profFile, err := os.Open(hAppArmorProfilesFileName) - if err == nil { - defer profFile.Close() - statusStr = "stopped" - content, err := readFileOnHostFileSystem(appArmorProfilesFileName) - if err == nil && len(content) > 0 { - statusStr = string(content) - } + content, err := readFileOnHostFileSystem(appArmorProfilesFileName) + if err != nil { + return statusStr // "unloaded" if file doesn't exist + } + if len(content) == 0 { + return "stopped" } - return statusStr + return string(content) }
58-70: Same redundant file operations and raw content return.Similar to
getAppArmorStatus, this has redundant file operations. Additionally, returning the raw file content isn't useful—the SELinux status should be parsed from theSELINUX=line in the config.Proposed improvement
func (s *LinuxSecurityHardeningSensor) getSELinuxStatus() string { content, err := readFileOnHostFileSystem(seLinuxConfigFileName) if err != nil { return "not found" } // Parse SELINUX= line for _, line := range strings.Split(string(content), "\n") { line = strings.TrimSpace(line) if strings.HasPrefix(line, "SELINUX=") { return strings.TrimPrefix(line, "SELINUX=") } } return "unknown" }pkg/hostsensormanager/manager.go (2)
70-76: Initial sensing blocksStart()return.The initial
runSensing(ctx)call at line 72 runs synchronously, which could blockStart()for several seconds while all sensors execute. Consider making this async or document thatStart()may take time.If fast startup is desired:
// Run initial sensing immediately - m.runSensing(ctx) + go m.runSensing(ctx) // Start periodic sensing
131-157: Consider adding timeout for individual sensor execution.If a sensor hangs (e.g., blocking on a slow/unresponsive filesystem), the entire sensing loop is blocked. Consider wrapping sensor execution with a per-sensor timeout.
func (m *manager) runSensor(ctx context.Context, sensor Sensor) error { sensorCtx, cancel := context.WithTimeout(ctx, 30*time.Second) defer cancel() // Pass sensorCtx to any context-aware operations // ... }Note: This requires sensors to respect context cancellation, which they currently don't. This could be addressed in a follow-up.
pkg/hostsensormanager/types.go (1)
24-32: Consider stronger typing forSense()return value.Returning
interface{}provides no compile-time guarantees about the returned data. Callers must perform type assertions, which can lead to runtime errors if a sensor returns an unexpected type.If all sensors return a CRD-like struct, consider defining a common interface or using generics:
type SensorData interface { GetSpec() interface{} GetStatus() Status }This is a recommended improvement that can be addressed in a follow-up iteration.
Overview
The
host-scanneris a K8s daemonset which sensing some basic stuff from a K8s node and expose them in a K8s YAML-like format via HTTP handlers and it runs by Kubescape only for the period of Kubescape scanning process.We want to merge host-scanner into node-agent and let the node-agent itself to sense the stuff and send it to K8s API server as new CRDs.
The motivation for this change is well explained in this slack thread
How to Test
As ususal
Related issues/PRs:
kubescape/helm-charts#773
kubescape/kubescape#1916
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.