Skip to content

Conversation

LukeAVanDrie
Copy link
Contributor

@LukeAVanDrie LukeAVanDrie commented Aug 6, 2025

This PR introduces the complete, concrete implementation of the FlowRegistry, the stateful control plane for the flow control system. This is a foundational architectural component that manages the lifecycle of all flows, queues, and policies, providing a sharded, concurrent-safe view of its state to the FlowController workers.

The architecture is designed to prioritize data path performance and strict correctness for control plane state transitions, resulting in a robust, scalable, and maintainable foundation for the flow control engine.

This tracks #674

Architectural Highlights

The design introduces a clear separation between the control plane (FlowRegistry) and the data plane (registryShard), employing several patterns to ensure correctness and performance under high concurrency:

  • Serialized Control Plane (Actor Model): The FlowRegistry uses an actor-like pattern. A single background goroutine processes all state change events from a channel. This serializes all mutations to the registry's core state (like scaling or GC), eliminating a significant class of race conditions.

  • Sharded Data Plane with Fine-Grained Locking: The registry's state is partitioned across multiple registryShard instances. Each shard uses fine-grained, per-priority-band locks, allowing concurrent data path operations across different priorities and dramatically reducing lock contention.

  • Asynchronous, Lock-Free Signaling: A lock-free atomic state machine is used for signaling between the data path and the control plane (e.g., for queue empty/non-empty transitions). This completely decouples the data path from control plane backpressure, guarantees strictly ordered signals, and prevents lost transitions even under high contention by coalescing signals at the source.

  • "Trust but Verify" Garbage Collection: A periodic, time-based scanner manages the lifecycle of idle flows. It uses a "Trust but Verify" pattern: it identifies candidate flows using an eventually-consistent cache ("Trust"), then performs a "stop-the-world" live check on the relevant priority band across all shards ("Verify") before deletion. This provides strong consistency precisely when needed while minimizing data path disruption.

  • Immutable Flow Identity: The FlowKey (ID + Priority) is immutable. To change the priority of traffic, a caller simply registers a new flow. The old flow is gracefully and automatically garbage collected once it becomes idle. This elegant design completely avoids complex and error-prone state migration logic.

Suggested Review Path

  1. Start with the contracts/ directory to understand the high-level interfaces and public API contracts.
  2. Next, please read the comprehensive package documentation (in registry/doc.go). This file contains the detailed architectural overview, including the concurrency and garbage collection strategies.
  3. Review the implementation files in a logical order: shard.go (the data plane slice), managedqueue.go (the stateful decorator with its lock-free signaler), flowstate.go (the GC cache), and finally registry.go (the orchestrator).
  4. Finally, review the configuration and validation logic in config.go.

Testing Philosophy and Validation

This PR includes a complete and robust test suite that provides extremely high confidence in the correctness of this complex, concurrent system. The testing strategy is a key feature of this contribution:

  • Deterministic Asynchronous Testing: The primary tests for the FlowRegistry (registry_test.go) use a test harness with an "event tap". This allows for gray-box testing of the actor model, enabling fast, deterministic, and race-free validation of asynchronous operations without sleeps or polling.
  • Targeted Concurrency Tests: Dedicated concurrency tests (Test..._Concurrency_...) exist to validate thread-safety under stress, specifically targeting the most critical race conditions like the GC/scaling lock interaction, the draining state transition, and the lock-free signaling mechanism.
  • Isolated Unit Tests: Lower-level components (config, flowstate, etc.) are tested in strict isolation to exhaustively validate their specific logic, error paths, and invariants.
  • Comprehensive Coverage: The suite achieves near-100% statement coverage on all key components and provides high behavioral coverage for the entire package.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 6, 2025
Copy link

netlify bot commented Aug 6, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit eefc461
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68a7919661be160008cfa4d6
😎 Deploy Preview https://deploy-preview-1319--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 6, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 6, 2025
@LukeAVanDrie
Copy link
Contributor Author

/assign @kfswain

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 6, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Aug 7, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 7, 2025
Copy link
Collaborator

@kfswain kfswain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing, I just have some comments that have been hanging since last night

@shmuelk
Copy link
Contributor

shmuelk commented Aug 10, 2025

pkg/epp/flowcontrol/registry/registry.go: The FlowRegistry itself. The central orchestrator.

If the code is an orchestrator, why is it called a registry and not simply FlowOrchestrator?

@LukeAVanDrie
Copy link
Contributor Author

pkg/epp/flowcontrol/registry/registry.go: The FlowRegistry itself. The central orchestrator.

If the code is an orchestrator, why is it called a registry and not simply FlowOrchestrator?

That is a very precise question. You've hit on a key point about the component's dual role. The name FlowRegistry was chosen deliberately because its primary external-facing responsibility is to act as a stateful catalog of flow instances. While it uses orchestration as an internal implementation strategy, its core purpose in the system's architecture is that of a registry.

Registry Pattern:

The FlowRegistry fits this pattern from a client's perspective:

  • Registration: The primary entry point is RegisterOrUpdateFlow(). A client explicitly registers a flow's specification.
  • Lookup/Discovery: Clients discover the state of the system via Shards(), which provides access to the registered entities.
  • Deregistration: It manages the removal of stale registrations via garbage collection.

Its main job is to be the single source of truth for "what flows exist and what is their configuration?"

Orchestrator Pattern:

The FlowRegistry uses this pattern internally to maintain its own state correctly in a concurrent environment:

  • It orchestrates shard scaling by telling individual registryShard instances to re-partition their configuration or enter a draining state.
  • It orchestrates garbage collection by reacting to events from queues and timers and then commanding shards to delete specific queue instances.

Why Registry is the More Fitting Name:

The orchestration is the how, not the what. It's the complex internal machinery that makes the registry robust.

We chose the name FlowRegistry because:

  1. It describes the public contract. From the outside, you interact with it as a registry.
  2. It avoids confusion with the FlowController. In our system, the FlowController is the component that actually orchestrates the dispatch of user requests. Calling this component FlowOrchestrator would create a significant naming collision and make it unclear which component is responsible for what part of the workflow.

This is an excellent point of feedback, though. It's clear my documentation could be more precise. I will update the GoDoc comment to clarify this distinction: that it is a Registry which uses an internal Actor-based orchestrator to manage its state.

Thank you for the sharp observation!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 11, 2025
@LukeAVanDrie LukeAVanDrie changed the title [WIP] feat(flowcontrol): Implement the FlowRegistry feat(flowcontrol): Implement the FlowRegistry Aug 15, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 15, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Aug 16, 2025

/approve
/lgtm
/hold

I approved in case you would like to address the comments in the followup PR.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 16, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 16, 2025
@LukeAVanDrie
Copy link
Contributor Author

@ahg-g (cc: @kfswain) -- Thanks again for the LGTM and approval on the initial version.

After it was approved, I did another deep pass on the implementation with a focus on hardening the concurrency model and improving the long-term maintainability before merging. I've just pushed up the refined version for your final review.

The core logic is the same, but I've made a few significant enhancements that I believe are critical for this foundational layer:

  1. Hardened Concurrency & Performance:
  • I've moved the registryShard from a single coarse-grained lock to a fine-grained locking model (per-priority-band). This significantly reduces lock contention and improves data path parallelism.
  • The signaling mechanism in managedQueue has been upgraded to a lock-free atomic state machine. This completely decouples the data path from control plane backpressure, guarantees no signals are lost, and makes the system far more resilient under high contention. It also significantly reduces noise on the events channel by coalescing signals when thrashing.
  1. More Robust GC & Scaling:
  • I've introduced an RWMutex (gcScaleLock) to explicitly synchronize the Garbage Collection and shard scaling operations, eliminating a potential TOCTOU race condition and making the system safer.
  • The GC logic itself was simplified from a generational model (still "mark and sweep" -- the algorithm is the same; only the marker has changed) to a more intuitive time-based approach, which is easier to reason about.
  1. Improved Testability: The test suite was overhauled to use a new harness with an "event tap," allowing for faster, fully deterministic validation of the asynchronous logic without any sleeps or flaky polling.

  2. Improved Documentation Maintainability:

  • While I still have detailed docs, they now follow the principle of first disclosure and information exists closer to the relevant types.

I'm much more confident in the robustness and performance of this version. Let me know what you think.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Aug 21, 2025

How can I tell the difference between the latest commit and what I reviewed before? for the future, I prefer to not force push so that it is easier for the reviewer to diff

@LukeAVanDrie
Copy link
Contributor Author

How can I tell the difference between the latest commit and what I reviewed before? for the future, I prefer to not force push so that it is easier for the reviewer to diff

Split from my reflog, so it should be visible as an independent commit now. What is the repo best practices for merge? Do we squash?

@ahg-g
Copy link
Contributor

ahg-g commented Aug 21, 2025

How can I tell the difference between the latest commit and what I reviewed before? for the future, I prefer to not force push so that it is easier for the reviewer to diff

Split from my reflog, so it should be visible as an independent commit now. What is the repo best practices for merge? Do we squash?

We have github configured to automatically squash, so just send reviewer responses as separate commits.

This commit introduces the complete, concrete implementation of the
`FlowRegistry`. As the stateful control plane for the flow control
systemm, it provides a scalable, concurrent-safe, and robust foundation
for managing the lifecycle of all flows, queues, and shards.

The architecture is designed to prioritize data path performance and
strict correctness for control plane state transitions.

Key architectural features include:

- **Serialized Control Plane (Actor Model):** All administrative
  operations and internal state change events are processed serially by
  a single background event loop. This fundamental design choice
  eliminates race conditions for complex, multi-step operations like
  shard scaling and garbage collection, simplifying the logic and
  guaranteeing correctness.

- **Sharded Architecture:** The registry's state is partitioned across
  multiple `registryShard` instances. This allows the data path
  (enqueue/ dispatch operations) to scale linearly with the number of
  workers and CPU cores by minimizing global lock contention.

- **Generational Garbage Collection:** We employ a periodic,
  generational scanner. This uses a "Trust but Verify" pattern: it
  identifies candidate flows using an eventually-consistent cache
  ("Trust"), then performs a "stop-the-world" live check against all
  shards ("Verify") before deletion. This provides strong consistency
  precisely when needed.

- **Immutable Flow Identity (`FlowKey`):** The `FlowKey` (ID + Priority)
  is treated as an immutable identifier. To change the priority of
  traffic, a caller simply registers a new flow with the new priority.
  The old flow is gracefully and automatically garbage collected once it
  becomes idle. This elegant design completely avoids complex and
  error-prone state migration logic.

- **Hybrid Concurrency Model:** A multi-tiered locking strategy is
  employed to maximize performance and correctness:
    - `FlowRegistry`: Coarse-grained lock for the serialized control
      plane.
    - `registryShard`: R/W locks to allow parallel reads from workers.
    - `managedQueue`: A hybrid mutex/atomic model to guarantee strict
      consistency between queue contents and statistics, which is
      critical for GC correctness.
@ahg-g
Copy link
Contributor

ahg-g commented Aug 21, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2025
@LukeAVanDrie
Copy link
Contributor Author

/remove-hold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2025
@k8s-ci-robot k8s-ci-robot merged commit 46b4553 into kubernetes-sigs:main Aug 21, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants