Skip to content

Conversation

@tshtark
Copy link

@tshtark tshtark commented Dec 11, 2025

Summary

Adds support for user-controlled build IDs via spec.workerOptions.buildID, enabling rolling updates for non-workflow code changes while preserving new deployment creation for workflow code changes.

Problem

With PINNED versioning strategy and long-running workflows, any pod spec change (image tag, env vars, resources) generates a new build ID, causing deployment proliferation.

Result: 10-15 active deployments running simultaneously, causing resource waste and operational complexity.

Solution

Allow users to set a stable build ID via spec.workerOptions.buildID. When the build ID is stable but pod spec changes, trigger a rolling update instead of creating a new deployment.

Key Changes

  • Add BuildID field to WorkerOptions struct in API types
  • Update ComputeBuildID to use spec field instead of annotation
  • Implement hash-based drift detection using SHA256 of user-provided pod template spec
  • Store hash in temporal.io/pod-template-spec-hash annotation on deployments
  • Only check for drift when buildID is explicitly set by user

Drift Detection

When spec.workerOptions.buildID is set, the controller detects spec drift by comparing a SHA256 hash of the user-provided pod template spec against the hash stored in the deployment annotation.

How it works:

  1. When a deployment is created, the controller computes a hash of the user-provided pod template spec (before controller modifications) and stores it in an annotation
  2. On each reconciliation, the controller computes the hash of the current spec and compares it to the stored hash
  3. If hashes differ, a rolling update is triggered

This approach:

  • Detects ALL changes to the pod template spec (images, env vars, commands, volumes, resources, etc.)
  • Avoids issues with cluster-provided values and timestamps
  • Maintains backwards compatibility (legacy deployments without the hash annotation are not affected)

When drift is detected: Controller triggers a rolling update of the existing deployment (preserves same build ID).

Usage

apiVersion: temporal.io/v1alpha1
kind: TemporalWorkerDeployment
metadata:
  name: my-worker
spec:
  workerOptions:
    connectionRef:
      name: my-connection
    temporalNamespace: default
    # Set this to your workflow code hash
    buildID: "wf-a1b2c3d4"
  template:
    spec:
      containers:
        - name: worker
          image: my-worker:v1.2.3  # Can change without new deployment

Behavior Matrix

Scenario Build ID Result
No buildID field Auto-generated from image + hash Existing behavior
BuildID set, first deploy Uses field value New deployment
BuildID unchanged, spec changed Same build ID Rolling update
BuildID changed New build ID New deployment
Empty/invalid buildID Falls back to auto-generated Existing behavior

Backwards Compatibility

  • Empty or invalid buildID values fall back to existing hash-based generation
  • No changes required for users who don't use this feature
  • Drift detection only runs when spec.workerOptions.buildID is explicitly set
  • Legacy deployments without the pod template spec hash annotation are not affected

Test plan

  • Build ID spec field override (6 test cases)
  • Hash computation tests (determinism, different images/env vars/commands/volumes)
  • Drift detection with hash comparison (7 test cases)
  • Backwards compatibility for legacy deployments without hash annotation
  • Edge cases: empty values, invalid chars, long values
  • go test ./internal/k8s/... ./internal/planner/... -v passes

@tshtark tshtark requested review from a team and jlegrone as code owners December 11, 2025 11:44
Adds support for user-controlled build IDs via spec.workerOptions.buildID,
enabling rolling updates for non-workflow code changes while preserving
new deployment creation for workflow code changes.

Key changes:
- Add BuildID field to WorkerOptions struct in API types
- Update ComputeBuildID to use spec field instead of annotation
- Implement drift detection by comparing deployed spec with desired spec
- Only check for drift when buildID is explicitly set by user

Drift detection currently monitors: replicas, minReadySeconds, container
images, container resources (limits/requests), and init container images.
Other fields (env vars, volumes, commands) are not monitored - this is
documented in the BuildID field comment.

Note: CRD regeneration includes some unrelated changes from controller-gen
(default values for name fields, x-kubernetes-map-type annotations). These
are standard regeneration artifacts and don't affect functionality.

This solves deployment proliferation for PINNED versioning strategy where
any pod spec change (image tag, env vars, resources) would generate a new
build ID and create unnecessary deployments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@tshtark tshtark force-pushed the feat/stable-build-id-override branch from 4f23d55 to 6362a09 Compare December 11, 2025 11:58
Copy link
Collaborator

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, sorry for the late review, just getting back to this after the holidays. I'm committed to turning this PR around quickly though and I think it's really close! I'll be keeping an eye on this tomorrow so we can iterate quickly.

Replace per-field comparison with SHA256 hash of user-provided pod template
spec. This detects ALL changes (env vars, commands, volumes, etc.) rather
than a subset of fields.

Changes:
- Add ComputePodTemplateSpecHash() using spew for deterministic hashing
- Store hash in pod-template-spec-hash annotation on deployments
- Compare hashes instead of individual fields for drift detection
- Add backwards compatibility for legacy deployments without hash

Tests:
- 6 tests for hash computation (images, env vars, commands, volumes)
- Updated drift detection tests including env var change case
- Backwards compatibility test for deployments without hash annotation

Docs:
- Update BuildID field documentation to reflect hash-based detection
@tshtark tshtark force-pushed the feat/stable-build-id-override branch from 8ba2666 to 7aad356 Compare January 8, 2026 06:53
Copy link
Collaborator

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So close to approving! The main decision I still want to make together is how to name the new field.

My review was delayed today because I wanted to write an integration test for this new functionality, which you can look at in this commit. Confirmed it works :) In the early phases of this project, features that only had unit tests and no integration tests experienced regressions, so I want to make sure that doesn't happen here. Feel free to cherry-pick my test commit onto your fork (if that can be done with a fork), or I can merge my integration test right after your PR merges.


// updateDeploymentWithPodTemplateSpec updates an existing deployment with a new pod template spec
// from the TWD spec. This applies all the controller modifications that NewDeploymentWithOwnerRef does.
func updateDeploymentWithPodTemplateSpec(
Copy link
Collaborator

@carlydf carlydf Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you re-use the pod-spec-modification code from NewDeploymentWithOwnerRef (if the whole thing can't be reused, maybe a shared helper function) so that we don't have to keep two separate functions in sync?

// +optional
// +kubebuilder:validation:MaxLength=63
// +kubebuilder:validation:Pattern=`^[a-zA-Z0-9]([a-zA-Z0-9._-]*[a-zA-Z0-9])?$`
BuildID string `json:"buildID,omitempty"`
Copy link
Collaborator

@carlydf carlydf Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to name this in a way that makes it clear that providing your own Build ID is not required, since as we say in the warning, it's not safe unless you do work on your own side to ensure that Build ID is updated when incompatible changes happen.

I worry that people could get confused, not read the doc-string description of the field, and copy-pasta themselves into a situation where they're not updating the Build ID and get NDEs in their workflows despite "using versioning."

Would you be opposed to calling it something like CustomBuildID?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants