Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
dd2bbfd
chore(release): add Helm chart, Grafana dashboard, GoReleaser + workf…
flyingrobots Sep 12, 2025
37ca42d
docs: add evidence README; add confidence scores to promotion checkli…
flyingrobots Sep 12, 2025
605ea9d
chore(gitignore): ignore Obsidian (.obsidian) and refine VS Code rule…
flyingrobots Sep 12, 2025
66ca59a
chore(vscode): add minimal workspace settings and extension recommend…
flyingrobots Sep 12, 2025
8b6a17f
chore(license): add copyright header to all Go source files
flyingrobots Sep 12, 2025
d45946d
chore(ci,release): align Go toolchain to 1.25.x across workflows and …
flyingrobots Sep 12, 2025
8dac5ed
feat(admin): add purge-all command to clear queues, heartbeats, proce…
flyingrobots Sep 12, 2025
b223210
docs(evidence): add run_bench.sh harness and README usage
flyingrobots Sep 12, 2025
8ddbb5a
fix(breaker): enforce single probe in HalfOpen; count trips; improve …
flyingrobots Sep 12, 2025
207ca84
chore: remove CLAUDE-CODE-REVIEW.md from VCS (review doc not to be co…
flyingrobots Sep 12, 2025
25d8cc7
docs(decisions): add did-not-agree index
flyingrobots Sep 12, 2025
4918a18
docs(decisions): document rate limiter choice and rationale
flyingrobots Sep 12, 2025
4bb66e0
docs(decisions): document BRPOPLPUSH prioritization approach and devi…
flyingrobots Sep 12, 2025
773b361
docs(decisions): document SCAN-based reaper rationale and revisit cri…
flyingrobots Sep 12, 2025
f25f669
docs(decisions): document metrics scope deferral and rationale
flyingrobots Sep 12, 2025
2a026f3
docs(decisions): document Go 1.25.x toolchain choice and rationale
flyingrobots Sep 12, 2025
5ddfd98
test(breaker): add HalfOpen single-probe load test under concurrent A…
flyingrobots Sep 12, 2025
842f342
test(worker): add breaker integration test ensuring Open state pauses…
flyingrobots Sep 12, 2025
4b1ef37
docs: add testing guide with per-suite descriptions and isolated run …
flyingrobots Sep 12, 2025
09eddd7
docs: normalize all code examples to fenced Markdown blocks with lang…
flyingrobots Sep 12, 2025
9ee3e5e
docs(testing): restore code-fenced, copy/paste test commands per suit…
flyingrobots Sep 12, 2025
a3dead2
chore(docs): add markdownlint (CI via GitHub Action) and pre-commit a…
flyingrobots Sep 12, 2025
dbd897f
docs: add contributing/docs linting section (markdownlint hooks + loc…
flyingrobots Sep 12, 2025
4f44ea6
chore(docs): markdownlint repo-wide pass; update config to disable MD…
flyingrobots Sep 12, 2025
738e606
chore(make): add mdlint target to run markdownlint-cli2 across docs
flyingrobots Sep 13, 2025
202c488
Update .github/workflows/goreleaser.yml
flyingrobots Sep 13, 2025
d09edac
Update .goreleaser.yaml
flyingrobots Sep 13, 2025
59e1d27
Update .goreleaser.yaml
flyingrobots Sep 13, 2025
11220a1
Update .vscode/settings.json
flyingrobots Sep 13, 2025
518d193
Update .githooks/pre-commit
flyingrobots Sep 15, 2025
72f6e31
Update .github/workflows/changelog.yml
flyingrobots Sep 15, 2025
190293a
Update .github/workflows/changelog.yml
flyingrobots Sep 15, 2025
423df1e
Update .github/workflows/ci.yml
flyingrobots Sep 15, 2025
203fdf5
Update .github/workflows/ci.yml
flyingrobots Sep 15, 2025
698655f
Update .github/workflows/goreleaser.yml
flyingrobots Sep 15, 2025
2f598e2
Update deploy/grafana/dashboards/work-queue.json
flyingrobots Sep 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/changelog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Update Changelog

on:
workflow_dispatch:
push:
tags:
- 'v*'

permissions:
contents: write

jobs:
changelog:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install git-chglog
run: go install github.com/git-chglog/git-chglog/cmd/git-chglog@latest
- name: Generate CHANGELOG.md
run: |
$(go env GOPATH)/bin/git-chglog -o CHANGELOG.md || echo "git-chglog not configured; keeping existing CHANGELOG.md"
- name: Commit changes
run: |
git config user.name "github-actions"
git config user.email "[email protected]"
git add CHANGELOG.md || true
git commit -m "chore(changelog): update CHANGELOG for ${GITHUB_REF_NAME}" || echo "no changes"
- name: Push changes
run: |
git push || echo "no push"
38 changes: 38 additions & 0 deletions .github/workflows/goreleaser.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: GoReleaser

on:
push:
tags:
- 'v*'
workflow_dispatch: {}

permissions:
contents: write
packages: write

jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.24.x'
- name: Set repo env
run: |
echo "GITHUB_REPOSITORY_OWNER=${GITHUB_REPOSITORY%/*}" >> $GITHUB_ENV
echo "GITHUB_REPOSITORY_NAME=${GITHUB_REPOSITORY#*/}" >> $GITHUB_ENV
- name: Login to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Run GoReleaser
uses: goreleaser/goreleaser-action@v6
with:
distribution: goreleaser
version: latest
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Comment on lines +34 to +41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Scope permissions narrowly and surface attestations.

Consider adding provenance/signing (SLSA/cosign) and setting tighter permissions per step. Not blocking, but you’ll thank me later.

🤖 Prompt for AI Agents
.github/workflows/goreleaser.yml lines 31-38: tighten the step permissions and
enable provenance/signing for releases by (1) adding a top-level permissions
block that limits the job to the minimum rights (e.g., contents: read, id-token:
write for OIDC token exchange, and only add packages or actions permissions if
absolutely required), (2) granting the goreleaser step only the specific
environment variables/secrets it needs instead of full GITHUB_TOKEN scope, and
(3) enabling artifact provenance and signing by calling goreleaser with
provenance flags and/or invoking a cosign/sigstore step after build (or
configure goreleaser to sign) using OIDC (id-token) and the COSIGN_* secrets so
releases are attested and signed. Ensure the workflow creates and exposes
attestations (SLSA provenance) and that any tokens/secrets are minimal and
scoped.

Comment on lines +35 to +41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

“version: latest” is non-deterministic. Pin your toolchain.

Releases must be reproducible.

-      - name: Run GoReleaser
-        uses: goreleaser/goreleaser-action@v6
+      - name: Run GoReleaser
+        uses: goreleaser/goreleaser-action@v6
         with:
           distribution: goreleaser
-          version: latest
+          version: v2.6.1 # pin a known-good Goreleaser version
           args: release --clean

Also consider guarding workflow_dispatch to snapshots to avoid accidental publishes:

-          args: release --clean
+          args: ${{ github.event_name == 'workflow_dispatch' && 'release --clean --skip=publish --snapshot' || 'release --clean' }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uses: goreleaser/goreleaser-action@v6
with:
distribution: goreleaser
version: latest
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Run GoReleaser
uses: goreleaser/goreleaser-action@v6
with:
distribution: goreleaser
version: v2.6.1 # pin a known-good Goreleaser version
args: ${{ github.event_name == 'workflow_dispatch' && 'release --clean --skip=publish --snapshot' || 'release --clean' }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
🤖 Prompt for AI Agents
.github/workflows/goreleaser.yml lines 35-41: the action uses a
non-deterministic "version: latest" which prevents reproducible releases and
risks accidental publishes; replace "latest" with a pinned Goreleaser version
(e.g. a specific tag or semver like v1.22.0 or the exact commit/tag you
validated) so the toolchain is fixed, and add a guard to workflow_dispatch (for
example require a specific input like "type=snapshot" or restrict dispatch to a
snapshot branch or protect with a conditional that only runs on tagged or main
branch) to prevent accidental publishes.

9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,17 @@ go.work

# IDE/Editor
.idea/
.vscode/
# VS Code: ignore all by default, allow key shared files
.vscode/*
!.vscode/extensions.json
!.vscode/settings.json
!.vscode/launch.json
Comment on lines +27 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Selective .vscode tracking is fine; ensure no secrets slip in.

extensions.json/settings.json/launch.json can hold credentials (e.g., debug env). Recommend adding a pre-commit check to block secrets in these files.

🤖 Prompt for AI Agents
In .gitignore around lines 27 to 31, the reviewer asks for a safeguard because
tracked .vscode files (extensions.json, settings.json, launch.json) can contain
secrets; add a pre-commit hook or integrate a repository check (e.g., Husky +
lint-staged or a CI check) that scans those specific files for
secrets/credentials and rejects commits when patterns are detected. Implement
the hook to run a lightweight secret scanner (regex checks for common patterns
like AWS keys, private keys, auth tokens, env var declarations) only on
.vscode/extensions.json, .vscode/settings.json, and .vscode/launch.json, and
ensure it exits non-zero with a clear error message instructing the developer to
remove secrets or move them to environment variables; add documentation in
CONTRIBUTING.md about the check and how to bypass temporarily with
maintainer-approved workflow.

*.swp
*.swo

# Obsidian
.obsidian/

# Logs
*.log
logs/
45 changes: 45 additions & 0 deletions .goreleaser.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
version: 2

project_name: job-queue-system

builds:
- id: job-queue-system
main: ./cmd/job-queue-system
env:
- CGO_ENABLED=0
goos: [linux, darwin, windows]
goarch: [amd64, arm64]
ldflags:
- -s -w -X main.version={{.Version}}
Comment on lines +12 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Harden builds: add -trimpath for reproducibility.

Reduces path leakage and improves determinism.

     ldflags:
-      - -s -w -X main.version={{.Version}}
+      - -s -w -trimpath -X main.version={{.Version}}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
.goreleaser.yaml around lines 12 to 13: the current ldflags "-s -w -X
main.version={{.Version}}" can leak build paths and harm reproducibility; add
the "-trimpath" linker flag to the ldflags list (e.g. include -trimpath
alongside -s and -w) so builds are hardened and path information is removed for
deterministic outputs.


archives:
- id: binaries
builds: [job-queue-system]
format: tar.gz
name_template: "{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}"

checksum:
name_template: "checksums_{{ .Version }}.txt"

changelog:
sort: desc
use: github

dockers:
- image_templates:
- ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:{{ .Version }}
- ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest
Comment on lines +31 to +33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Don’t slap :latest on alpha builds.

Tagging “latest” for pre-releases is how you torch users. Gate “latest” to stable releases only (e.g., via templating/conditions or a separate docker entry that runs only when not prerelease). Verify the exact GoReleaser field to key off (e.g., .Prerelease or similar) and wire it accordingly.


🌐 Web query:

How to conditionally add the "latest" Docker tag only for non-prerelease builds in GoReleaser v2? Provide the recommended config snippet.

💡 Result:

Use GoReleaser template conditionals in image_templates to emit :latest only when .Prerelease is empty (i.e., not a prerelease). Example config snippet:

.goreleaser.yaml
dockers:

  • image_templates:
    • "myuser/myimage:{{ .Tag }}"
    • '{{ if eq .Prerelease "" }}myuser/myimage:latest{{ end }}'

References: GoReleaser Docker docs [1], Template variables (.Prerelease) [2].


DO NOT tag prereleases with :latest — gate :latest to stable releases.

File: .goreleaser.yaml (lines 31–33) — replace the unconditional :latest with a template conditional that only emits :latest when .Prerelease is empty:

  • image_templates:
    • ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:{{ .Version }}
    • '{{ if eq .Prerelease "" }}ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest{{ end }}'

This uses .Prerelease to ensure only stable releases get :latest.

🤖 Prompt for AI Agents
.goreleaser.yaml around lines 31 to 33: the image_templates list unconditionally
emits a :latest tag which will tag prereleases; modify the template to only emit
the :latest entry when .Prerelease is empty by replacing the unconditional
:latest line with a conditional template that checks eq .Prerelease "" and only
outputs the ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest entry in that case,
keeping the version-tag line unchanged.

dockerfile: Dockerfile
build_flag_templates:
- "--pull"

Comment on lines +30 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enable multi-arch Docker images via buildx + add OCI labels.

Without buildx/platforms, images will be single-arch (runner default). Add platforms and basic OCI labels.

 dockers:
   - image_templates:
       - ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:{{ .Version }}
       - ghcr.io/{{ .RepoOwner }}/{{ .RepoName }}:latest
     dockerfile: Dockerfile
+    use: buildx
+    platforms:
+      - linux/amd64
+      - linux/arm64
     build_flag_templates:
       - "--pull"
+    labels:
+      - "org.opencontainers.image.source={{.GitURL}}"
+      - "org.opencontainers.image.revision={{.FullCommit}}"
+      - "org.opencontainers.image.version={{.Version}}"

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In .goreleaser.yaml around lines 28-35, the dockers entry needs multi-arch
buildx configuration and basic OCI labels; update the dockers block to include a
platforms array (e.g. linux/amd64, linux/arm64) so GoReleaser will use buildx to
produce multi-arch images, and add an oci/labels map with standard keys
(org.opencontainers.image.title, description, url, source, created, version,
authors) populated from template variables ({{ .ProjectName }}, {{ .Version }},
{{ .RepoURL }}, {{ .CommitDate }} etc.); ensure the labels are nested under the
docker build entry and keep existing image_templates and build_flag_templates
intact so images are built multi-arch and carry OCI metadata.

release:
github:
owner: {{ .Env.GITHUB_REPOSITORY_OWNER }}
name: {{ .Env.GITHUB_REPOSITORY_NAME }}
draft: false
prerelease: auto

snapshot:
name_template: SNAPSHOT-{{ .ShortCommit }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Kill the stray trailing blank line.

YAMLlint already complained. Remove it.

-  name_template: SNAPSHOT-{{ .ShortCommit }}
-
+  name_template: SNAPSHOT-{{ .ShortCommit }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name_template: SNAPSHOT-{{ .ShortCommit }}
name_template: SNAPSHOT-{{ .ShortCommit }}
🤖 Prompt for AI Agents
In .goreleaser.yaml around line 46, there is a stray trailing blank line after
the "name_template: SNAPSHOT-{{ .ShortCommit }}" entry; remove that empty line
so the file has no extra blank line at the end (save the file without the
trailing newline) to satisfy yamllint.


6 changes: 6 additions & 0 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"recommendations": [
"golang.go"
],
"unwantedRecommendations": []
}
Comment on lines +1 to +6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add YAML/Helm tooling to match new chart assets

Recommend adding Kubernetes/Helm and YAML support to align with the new Helm chart and K8s manifests.

 {
   "recommendations": [
-    "golang.go"
+    "golang.go",
+    "redhat.vscode-yaml",
+    "ms-kubernetes-tools.vscode-kubernetes-tools"
   ],
   "unwantedRecommendations": []
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{
"recommendations": [
"golang.go"
],
"unwantedRecommendations": []
}
{
"recommendations": [
"golang.go",
"redhat.vscode-yaml",
"ms-kubernetes-tools.vscode-kubernetes-tools"
],
"unwantedRecommendations": []
}
🤖 Prompt for AI Agents
.vscode/extensions.json around lines 1 to 6: the workspace extensions list only
recommends the Go extension but the repo now contains Helm charts and Kubernetes
YAML manifests; update the recommendations array to include YAML and
Helm/Kubernetes tooling (for example add "redhat.vscode-yaml",
"ms-azuretools.vscode-helm" and "ms-kubernetes-tools.vscode-kubernetes-tools")
so users get proper linting, schema support and Helm assistance when opening the
workspace.

15 changes: 15 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"editor.formatOnSave": false,
"[go]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": true
}
},
"gopls": {
"ui.semanticTokens": true,
"build.experimentalWorkspaceModule": true
},
Comment on lines +12 to +15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify gopls experimental flag; it may be obsolete

gopls’s build.experimentalWorkspaceModule has changed across versions; confirm it’s still valid with your pinned Go extension/gopls. If obsolete, drop it.

   "gopls": {
     "ui.semanticTokens": true,
-    "build.experimentalWorkspaceModule": true
+    // "build.experimentalWorkspaceModule": true
   },

To verify with upstream docs:


🌐 Web query:

Is gopls setting "build.experimentalWorkspaceModule" still supported in the latest gopls? If deprecated, what is the recommended default behavior or replacement?

💡 Result:

Short answer: No — it was deprecated and then removed. Use go.work (multi-module workspace support in the go command) instead.

Details / sources:

  • Deprecation announced (replaced by go.work; removed in gopls v0.11.0+). [1]
  • Release notes / removal and recommended replacement (create a go.work; e.g. go work use -r .). [2][3]

Sources:
[1] gopls v0.10.x release notes (deprecates experimentalWorkspaceModule in favor of go.work).
[2] gopls v0.12.0 notes (removes the setting; instructs using go.work / go work use -r .).
[3] GitHub issue tracking removal of experimentalWorkspaceModule.


Remove deprecated gopls setting "build.experimentalWorkspaceModule"

Deprecated in gopls v0.10.x and removed by v0.12.0; use go.work (e.g. go work use -r .) for multi-module workspaces. Action: delete the "build.experimentalWorkspaceModule" entry from .vscode/settings.json (lines 9–12); do not comment it out (JSON has no comments).

🤖 Prompt for AI Agents
In .vscode/settings.json around lines 9 to 12, the gopls setting
"build.experimentalWorkspaceModule" is deprecated and should be removed; edit
the file and delete the "build.experimentalWorkspaceModule": true entry (do not
comment it out), leaving the rest of the "gopls" object intact and ensure the
JSON remains valid.

"go.testFlags": ["-race", "-count=1"],
"go.toolsManagement.autoUpdate": true
Comment on lines +16 to +17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Test flags trade-off

Running tests with -race and -count=1 by default is great for catching issues but slower. If dev feedback becomes a concern, consider moving -race to a separate profile/task.

🤖 Prompt for AI Agents
In .vscode/settings.json at lines 13-14 the default go.testFlags include
["-race", "-count=1"], which slows local feedback; remove "-race" from the
default go.testFlags and add a separate configuration/task for race-enabled
tests (or a VS Code test profile) so developers can run fast tests by default
while still having an easy opt-in for race detection; update settings.json to
set go.testFlags to ["-count=1"] and add a new task or profile (e.g., in
.vscode/tasks.json or launch configurations) named "Go: test (race)" that
invokes tests with ["-race","-count=1"] and document usage in the project
README.

}
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,14 @@ See docs/ for the Product Requirements Document (PRD) and detailed design. A sam

## Status

Scaffolding in place. Implementation, PRD, tests, and CI are coming next per plan.
Release branch open for v0.4.0-alpha: see PR https://github.com/flyingrobots/go-redis-work-queue/pull/1

Promotion gates and confidence summary (details in docs/15_promotion_checklists.md):
- Alpha → Beta: overall confidence ~0.85 (functional/observability/CI strong; perf and coverage improvements planned)
- Beta → RC: overall confidence ~0.70 (needs controlled perf run, chaos tests, soak)
- RC → GA: overall confidence ~0.70 (release flow ready; soak and rollback rehearsal pending)

Evidence artifacts (docs/evidence/):
- ci_run.json (CI URL), bench.json (throughput/latency), metrics_before/after.txt, config.alpha.yaml

To reproduce evidence locally, see docs/evidence/README.md.
37 changes: 37 additions & 0 deletions deploy/grafana/dashboards/work-queue.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"title": "Go Redis Work Queue",
"schemaVersion": 38,
"panels": [
{
"type": "timeseries",
"title": "Job Processing Duration (p95)",
"targets": [{"expr": "histogram_quantile(0.95, sum(rate(job_processing_duration_seconds_bucket[5m])) by (le))"}]
},
Comment on lines +6 to +9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Histogram quantile: add explicit sum by(le) and preserve other labels with without().

Current expr is fine; this version avoids accidental label fan-out.

-      "targets": [{"expr": "histogram_quantile(0.95, sum(rate(job_processing_duration_seconds_bucket[5m])) by (le))"}]
+      "targets": [{"expr": "histogram_quantile(0.95, sum without (instance, pod) (rate(job_processing_duration_seconds_bucket[5m])) by (le))"}]

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In deploy/grafana/dashboards/work-queue.json around lines 6 to 9, the PromQL
should explicitly sum by(le) and avoid accidental label fan-out; replace the
current targets expr with one that wraps rate(...) in sum by (le) (i.e.
histogram_quantile(0.95, sum by (le)
(rate(job_processing_duration_seconds_bucket[5m])))) and if you need to retain
or drop other labels use sum without(<labels-to-drop>) or include those labels
in the by(...) clause so only le is used for the histogram_quantile aggregation
and other labels are preserved correctly.

{
"type": "timeseries",
"title": "Jobs Completed / Failed / Retried",
"targets": [
{"expr": "rate(jobs_completed_total[5m])"},
{"expr": "rate(jobs_failed_total[5m])"},
{"expr": "rate(jobs_retried_total[5m])"}
]
Comment on lines +13 to +17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Counter rates: ensure aggregation to a single series per metric.

Add sum without volatile labels to avoid per-instance sprawl.

-        {"expr": "rate(jobs_completed_total[5m])"},
-        {"expr": "rate(jobs_failed_total[5m])"},
-        {"expr": "rate(jobs_retried_total[5m])"}
+        {"expr": "sum without (instance, pod) (rate(jobs_completed_total[5m]))"},
+        {"expr": "sum without (instance, pod) (rate(jobs_failed_total[5m]))"},
+        {"expr": "sum without (instance, pod) (rate(jobs_retried_total[5m]))"}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"targets": [
{"expr": "rate(jobs_completed_total[5m])"},
{"expr": "rate(jobs_failed_total[5m])"},
{"expr": "rate(jobs_retried_total[5m])"}
]
"targets": [
{"expr": "sum without (instance, pod) (rate(jobs_completed_total[5m]))"},
{"expr": "sum without (instance, pod) (rate(jobs_failed_total[5m]))"},
{"expr": "sum without (instance, pod) (rate(jobs_retried_total[5m]))"}
]
🤖 Prompt for AI Agents
In deploy/grafana/dashboards/work-queue.json around lines 13 to 17, the panel is
using rate(...) directly which produces one series per scrapped instance and
causes per-instance sprawl; wrap each rate(...) with a sum(...) to aggregate
into a single series (e.g. replace rate(jobs_completed_total[5m]) with
sum(rate(jobs_completed_total[5m]))) so volatile labels like instance are
collapsed; apply the same change to jobs_failed_total and jobs_retried_total
expressions.

},
{
"type": "stat",
"title": "Circuit Breaker State",
"targets": [{"expr": "circuit_breaker_state"}],
"options": {"reduceOptions": {"calcs": ["last"], "fields": ""}}
},
Comment on lines +21 to +24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Stat panel may show multiple series; reduce in query.

Use max() or sum() over labels and map values to text.

-      "targets": [{"expr": "circuit_breaker_state"}],
+      "targets": [{"expr": "max without (instance, pod) (circuit_breaker_state)"}],
       "options": {"reduceOptions": {"calcs": ["last"], "fields": ""}}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"title": "Circuit Breaker State",
"targets": [{"expr": "circuit_breaker_state"}],
"options": {"reduceOptions": {"calcs": ["last"], "fields": ""}}
},
"title": "Circuit Breaker State",
"targets": [{"expr": "max without (instance, pod) (circuit_breaker_state)"}],
"options": {"reduceOptions": {"calcs": ["last"], "fields": ""}}
},
🤖 Prompt for AI Agents
In deploy/grafana/dashboards/work-queue.json around lines 21–24, the stat panel
currently pulls raw circuit_breaker_state which can return multiple series;
change the query to reduce across series (e.g., use max() or sum() over the
relevant labels in the PromQL expression) instead of relying on panel
reduceOptions, and then configure the stat panel to map numeric values to text
(e.g., 0→"closed", 1→"open", etc.) so the panel shows a single reduced value and
human-readable state.

{
"type": "timeseries",
"title": "Queue Lengths",
"targets": [{"expr": "queue_length"}]
},
{
"type": "stat",
"title": "Active Workers",
"targets": [{"expr": "worker_active"}],
"options": {"reduceOptions": {"calcs": ["last"], "fields": ""}}
}
Comment on lines +31 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Active workers stat: reduce to a single value.

Aggregate across labels to avoid arbitrary series selection.

-      "targets": [{"expr": "worker_active"}],
+      "targets": [{"expr": "sum without (instance, pod) (worker_active)"}],
       "options": {"reduceOptions": {"calcs": ["last"], "fields": ""}}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"type": "stat",
"title": "Active Workers",
"targets": [{"expr": "worker_active"}],
"options": {"reduceOptions": {"calcs": ["last"], "fields": ""}}
}
"type": "stat",
"title": "Active Workers",
"targets": [{"expr": "sum without (instance, pod) (worker_active)"}],
"options": {"reduceOptions": {"calcs": ["last"], "fields": ""}}
}
🤖 Prompt for AI Agents
In deploy/grafana/dashboards/work-queue.json around lines 31 to 35, the stat
panel target uses the raw metric "worker_active" which can return multiple
series and leads to arbitrary series selection; change the Prometheus expression
to aggregate across labels (for example use sum(worker_active)) so the query
returns a single value for the stat panel and keep the reduceOptions as-is or
adjust if needed.

]
}
75 changes: 75 additions & 0 deletions docs/15_promotion_checklists.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Promotion Checklists

- Last updated: 2025-09-12

## Alpha → Beta Checklist
- [ ] Functional completeness: producer, worker, all-in-one, reaper, breaker, admin CLI
- [ ] Observability: /metrics, /healthz, /readyz live and correct
- [ ] CI green on main (build, vet, race, unit, integration, e2e)
- [ ] Unit coverage ≥ 80% core packages (attach coverage report)
- [ ] E2E passes deterministically (≥ 5 runs)
- [ ] govulncheck: no Critical/High in code paths/stdlib
- [ ] Performance baseline: 1k jobs at 500/s complete; limiter ±10%/60s
- [ ] Docs: README, PRD, test plan, deployment, runbook updated
- Evidence links:
- CI run URL: …
- Bench JSON: …
- Metrics snapshot(s): …
- Issues list: …

### Confidence Scores (Alpha → Beta)

| Criterion | Confidence | Rationale | How to improve |
|---|---:|---|---|
| Functional completeness | 0.9 | All core roles implemented and tested; admin CLI present | Add more end-to-end tests for admin flows; document edge cases |
| Observability endpoints | 0.95 | Live and exercised in CI/e2e; stable | Add /healthz readiness probes to example manifests; alert rules examples |
| CI health | 0.9 | CI green with race, vet, e2e, govulncheck | Increase matrix (Go versions, OS); add flaky-test detection |
| Coverage ≥ 80% | 0.75 | Core packages covered; gaps in admin/obs | Add tests for admin and HTTP server handlers |
| E2E determinism | 0.8 | E2E with Redis service stable locally and in CI | Add retries and timing buffers; run 5x in workflow and gate |
| Security (govulncheck) | 0.95 | Using Go 1.24; no critical findings | Add image scanning; pin base image digest |
| Performance baseline | 0.7 | Bench harness exists; sample run meets ~960 jobs/min, latency sampling coarse | Improve latency measurement via metrics; run on 4 vCPU node and document env |
| Documentation completeness | 0.9 | PRD, runbook, deployment, perf, checklists present | Add Helm usage examples and alert rules |

## Beta → RC Checklist
- [ ] Throughput ≥ 1k jobs/min for ≥ 10m; p95 < 2s (<1MB files)
- [ ] Chaos tests: Redis outage/latency/worker crash → no lost jobs; breaker transitions
- [ ] Admin CLI validated against live instance
- [ ] Queue gauges and breaker metric accurate under load
- [ ] 24–48h soak: error rate < 0.5%, no leaks
- [ ] govulncheck clean; deps pinned
- [ ] Docs: performance report and tuning
- [ ] No P0/P1; ≤ 3 P2s w/ workarounds
- Evidence links as above

## RC → GA Checklist
- [ ] Code freeze; only showstopper fixes
- [ ] 0 P0/P1; ≤ 2 P2s with workarounds; no flakey tests across 10 runs
- [ ] Release workflow proven; rollback rehearsal complete
- [ ] Config/backcompat validated or migration guide
- [ ] Docs complete; README examples validated
- [ ] govulncheck clean; image scan no Critical
- [ ] 7-day RC soak: readiness > 99.9%, DLQ < 0.5%
- Evidence links as above
### Confidence Scores (Beta → RC)

| Criterion | Confidence | Rationale | How to improve |
|---|---:|---|---|
| ≥1k jobs/min for ≥10m | 0.6 | Not yet run on dedicated 4 vCPU node | Schedule controlled benchmark; record metrics and environment |
| p95 < 2s (<1MB) | 0.6 | Latency sampling method is coarse | Use Prometheus histogram quantiles on /metrics; run sustained test |
| Chaos (outage/latency/crash) | 0.7 | Logic supports recovery; tests cover happy-path and reaper | Add chaos e2e in CI (stop Redis container; tc latency); verify no loss |
| Admin validation | 0.85 | Admin commands tested manually; unit tests for helpers | Add e2e assertions for stats and peek outputs |
| Gauges/breaker accuracy | 0.85 | Metrics wired; observed locally | Add metric assertions in e2e; dashboards and alerts validate |
| 24–48h soak | 0.5 | Not yet executed | Run soak in staging and record dashboards |
| Security and deps | 0.9 | govulncheck in CI; deps pinned | Add Renovate/Dependabot; image scanning stage |
| Issue hygiene | 0.9 | No open P0/P1 | Enforce labels and triage automation |
### Confidence Scores (RC → GA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Surround table blocks with blank lines (MD058) and add space before next heading (MD022).

Also consider linking Evidence placeholders to docs/evidence/* files.

-| Criterion | Confidence | Rationale | How to improve |
+| Criterion | Confidence | Rationale | How to improve |
 |---|---:|---|---|
 ...
-| Issue hygiene | 0.9 | No open P0/P1 | Enforce labels and triage automation |
-### Confidence Scores (RC → GA)
+| Issue hygiene | 0.9 | No open P0/P1 | Enforce labels and triage automation |
+
+### Confidence Scores (RC → GA)
+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Criterion | Confidence | Rationale | How to improve |
|---|---:|---|---|
| ≥1k jobs/min for ≥10m | 0.6 | Not yet run on dedicated 4 vCPU node | Schedule controlled benchmark; record metrics and environment |
| p95 < 2s (<1MB) | 0.6 | Latency sampling method is coarse | Use Prometheus histogram quantiles on /metrics; run sustained test |
| Chaos (outage/latency/crash) | 0.7 | Logic supports recovery; tests cover happy-path and reaper | Add chaos e2e in CI (stop Redis container; tc latency); verify no loss |
| Admin validation | 0.85 | Admin commands tested manually; unit tests for helpers | Add e2e assertions for stats and peek outputs |
| Gauges/breaker accuracy | 0.85 | Metrics wired; observed locally | Add metric assertions in e2e; dashboards and alerts validate |
| 24–48h soak | 0.5 | Not yet executed | Run soak in staging and record dashboards |
| Security and deps | 0.9 | govulncheck in CI; deps pinned | Add Renovate/Dependabot; image scanning stage |
| Issue hygiene | 0.9 | No open P0/P1 | Enforce labels and triage automation |
### Confidence Scores (RC → GA)
| Criterion | Confidence | Rationale | How to improve |
|---|---:|---|---|
| ≥1k jobs/min for ≥10m | 0.6 | Not yet run on dedicated 4 vCPU node | Schedule controlled benchmark; record metrics and environment |
| p95 < 2s (<1MB) | 0.6 | Latency sampling method is coarse | Use Prometheus histogram quantiles on /metrics; run sustained test |
| Chaos (outage/latency/crash) | 0.7 | Logic supports recovery; tests cover happy-path and reaper | Add chaos e2e in CI (stop Redis container; tc latency); verify no loss |
| Admin validation | 0.85 | Admin commands tested manually; unit tests for helpers | Add e2e assertions for stats and peek outputs |
| Gauges/breaker accuracy | 0.85 | Metrics wired; observed locally | Add metric assertions in e2e; dashboards and alerts validate |
| 24–48h soak | 0.5 | Not yet executed | Run soak in staging and record dashboards |
| Security and deps | 0.9 | govulncheck in CI; deps pinned | Add Renovate/Dependabot; image scanning stage |
| Issue hygiene | 0.9 | No open P0/P1 | Enforce labels and triage automation |
### Confidence Scores (RC → GA)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

64-64: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


65-65: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
In docs/15_promotion_checklists.md around lines 55–65, the markdown table is not
surrounded by blank lines (MD058) and the next heading lacks a preceding blank
line (MD022); edit the file to ensure there is an empty line before the table
and an empty line after the table, and also insert a blank line immediately
before the "### Confidence Scores (RC → GA)" heading; additionally replace or
augment the "Evidence" placeholders in the table with links to actual
docs/evidence/* files (or add relative links to new evidence files) so each
evidence cell points to the corresponding document.


| Criterion | Confidence | Rationale | How to improve |
|---|---:|---|---|
| Code freeze discipline | 0.8 | Process defined; branch protection enabled | Require 1 review and passing checks (enabled); add CODEOWNERS |
| Zero P0/P1; ≤2 P2 | 0.85 | Current backlog clean | Maintain triage; add SLOs for bug classes |
| Release workflow | 0.9 | GoReleaser + GHCR configured; test via pre-release | Dry-run snapshot and tag a pre-release on branch |
| Rollback rehearsal | 0.6 | Procedure documented | Execute runbook in staging and document proof |
| Backward compatibility | 0.8 | Config stable; validation added | Add versioned config schema and migration notes |
| Docs completeness | 0.9 | Extensive docs present | Add Grafana/Prometheus import snippets and examples (added dashboard) |
| 7-day soak | 0.5 | Not yet executed | Run RC soak with dashboard snapshots and attach to evidence |
17 changes: 17 additions & 0 deletions docs/evidence/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Evidence for v0.4.0-alpha Promotion

- CI run: see `ci_run.json` (contains URL to the successful workflow run)
- Bench JSON: `bench.json` (admin bench with 1000 jobs at 500 rps)
- Config used: `config.alpha.yaml`
- Metrics snapshots: `metrics_before.txt`, `metrics_after.txt`

Reproduce locally

1) Ensure Redis is running on `localhost:6379` (e.g., `docker run -p 6379:6379 redis:7-alpine`)
2) Build binary: `make build`
3) Start worker: `./bin/job-queue-system --role=worker --config=docs/evidence/config.alpha.yaml`
4) In another terminal, run bench: `./bin/job-queue-system --role=admin --config=docs/evidence/config.alpha.yaml --admin-cmd=bench --bench-count=1000 --bench-rate=500 --bench-priority=low --bench-timeout=60s`
5) Capture metrics: `curl -sS localhost:9191/metrics | head -n 200 > docs/evidence/metrics_after.txt`

Notes
- The simple latency reported in `bench.json` is measured by comparing current time to each job's creation_time after completion sampling and is a coarse approximation. For precise latency distributions, prefer Prometheus histogram `job_processing_duration_seconds` and compute quantiles there.
7 changes: 7 additions & 0 deletions docs/evidence/bench.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"count": 1000,
"duration": 62125591875,
"throughput_jobs_per_sec": 16.096426123586127,
"p50_latency": 252446328000,
"p95_latency": 252878289000
}
Comment on lines +1 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Evidence inconsistency: units/values don’t match the “1000 jobs @ 500 rps” claim.

  • duration=62.1s and throughput≈16.1/s contradict 500 rps.
  • p50/p95 latencies ≈252s suggest unit or measurement bugs.

Please verify collection units and recompute metrics before using this for promotion.

Run to sanity-check units and derived values:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
jq -r '
  . as $r |
  "count=\($r.count)",
  "duration_ns=\($r.duration) (~\($r.duration/1e9) s)",
  "throughput_jobs_per_sec=\($r.throughput_jobs_per_sec)",
  "p50_ns=\($r.p50_latency) (~\($r.p50_latency/1e9) s)",
  "p95_ns=\($r.p95_latency) (~\($r.p95_latency/1e9) s)"
' docs/evidence/bench.json

Length of output: 167


Fix inconsistent benchmark metrics in docs/evidence/bench.json

  • duration = 62.1256 s and throughput = 16.0964 jobs/s — does not match the claimed "1000 jobs @ 500 rps" (1000/500 = 2.0 s).
  • p50 = 252.4463 s and p95 = 252.8783 s exceed total duration (62.1256 s) — latency units or aggregation are incorrect.
    Action: verify measurement units, recompute percentiles and throughput, and update docs/evidence/bench.json before promotion.
🤖 Prompt for AI Agents
docs/evidence/bench.json lines 1-7: the recorded duration, throughput and
percentiles are inconsistent (duration=62125591875, throughput=16.0964 jobs/s,
p50/p95 ~252s) versus the claimed "1000 jobs @ 500 rps" (expected duration ~2s
and latencies <= duration). Fix by checking the raw timing units (nanoseconds vs
microseconds vs milliseconds vs seconds), convert the stored numeric values to
consistent seconds (or explicitly choose ms and document unit), recompute
throughput as count / duration_in_seconds, recalculate p50 and p95 from the
per-request latency samples using the correct unit so they are <= total
duration, and then update the JSON fields (duration, throughput_jobs_per_sec,
p50_latency, p95_latency) and add a units note if not present.

1 change: 1 addition & 0 deletions docs/evidence/ci_run.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[{"conclusion":"success","displayTitle":"chore(release): add Helm chart, Grafana dashboard, GoReleaser + workflows, changelog automation, promotion checklists, evidence harness","headBranch":"release/alpha-v0.4.0","status":"completed","url":"https://github.com/flyingrobots/go-redis-work-queue/actions/runs/17684747392"}]
47 changes: 47 additions & 0 deletions docs/evidence/config.alpha.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
redis:
addr: "localhost:6379"
pool_size_multiplier: 10
min_idle_conns: 5
dial_timeout: 5s
read_timeout: 3s
write_timeout: 3s
max_retries: 3

worker:
count: 8
heartbeat_ttl: 30s
max_retries: 3
backoff:
base: 100ms
max: 2s
priorities: ["high", "low"]
queues:
high: "jobqueue:high_priority"
low: "jobqueue:low_priority"
processing_list_pattern: "jobqueue:worker:%s:processing"
heartbeat_key_pattern: "jobqueue:processing:worker:%s"
Comment on lines +19 to +22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

YAML lint: extra spaces after colon (lines 20, 22).

Normalize spacing.

-    low:  "jobqueue:low_priority"
+    low: "jobqueue:low_priority"
 ...
-  heartbeat_key_pattern:  "jobqueue:processing:worker:%s"
+  heartbeat_key_pattern: "jobqueue:processing:worker:%s"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
high: "jobqueue:high_priority"
low: "jobqueue:low_priority"
processing_list_pattern: "jobqueue:worker:%s:processing"
heartbeat_key_pattern: "jobqueue:processing:worker:%s"
high: "jobqueue:high_priority"
low: "jobqueue:low_priority"
processing_list_pattern: "jobqueue:worker:%s:processing"
heartbeat_key_pattern: "jobqueue:processing:worker:%s"
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 20-20: too many spaces after colon

(colons)


[error] 22-22: too many spaces after colon

(colons)

🤖 Prompt for AI Agents
In docs/evidence/config.alpha.yaml around lines 19 to 22, there are extra spaces
after the mapping colons on lines 20 and 22; normalize the YAML spacing by
ensuring exactly one space after each colon in mappings (e.g., change "low:  "
and "heartbeat_key_pattern:  " to use a single space) so the file conforms to
YAML lint rules.

completed_list: "jobqueue:completed"
dead_letter_list: "jobqueue:dead_letter"
brpoplpush_timeout: 1s

producer:
scan_dir: "./data"
include_globs: ["**/*"]
exclude_globs: ["**/*.tmp", "**/.DS_Store"]
default_priority: "low"
high_priority_exts: [".pdf", ".docx", ".xlsx", ".zip"]
rate_limit_per_sec: 1000
rate_limit_key: "jobqueue:rate_limit:producer"

Comment on lines +27 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Producer limiter defaults: 1000/s may exceed Alpha perf targets.

If Alpha bench targets 1k jobs/min, consider 20/s default in alpha config to avoid accidental overload during demos.

-  rate_limit_per_sec: 1000
+  rate_limit_per_sec: 20
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
producer:
scan_dir: "./data"
include_globs: ["**/*"]
exclude_globs: ["**/*.tmp", "**/.DS_Store"]
default_priority: "low"
high_priority_exts: [".pdf", ".docx", ".xlsx", ".zip"]
rate_limit_per_sec: 1000
rate_limit_key: "jobqueue:rate_limit:producer"
producer:
scan_dir: "./data"
include_globs: ["**/*"]
exclude_globs: ["**/*.tmp", "**/.DS_Store"]
default_priority: "low"
high_priority_exts: [".pdf", ".docx", ".xlsx", ".zip"]
rate_limit_per_sec: 20
rate_limit_key: "jobqueue:rate_limit:producer"
🤖 Prompt for AI Agents
In docs/evidence/config.alpha.yaml around lines 27 to 35, the producer
rate_limit_per_sec is set to 1000 which can exceed Alpha performance targets;
change the default to 20 to match the 1k jobs/min (≈20/s) bench target to
prevent accidental overload during demos and update any nearby comment or
documentation to note this alpha-safe default.

circuit_breaker:
failure_threshold: 0.5
window: 1m
cooldown_period: 30s
min_samples: 20

observability:
metrics_port: 9191
log_level: "info"
tracing:
enabled: false
endpoint: ""
Comment on lines +42 to +47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Port mismatch vs README (9090 vs 9191). Align or document.

Either switch metrics_port to 9090 (matches README) or update README and Helm values to 9191 to avoid confusion.

-observability:
-  metrics_port: 9191
+observability:
+  metrics_port: 9090

🏁 Script executed:

#!/usr/bin/env bash
set -euo pipefail

echo "PWD: $(pwd)"
echo "---- search for 'metrics_port' ----"
if command -v rg >/dev/null 2>&1; then
  rg -n --hidden -S '\bmetrics_port\b' -g '!*node_modules/**' -C3 || true
else
  grep -RIn --exclude-dir=node_modules -E '\bmetrics_port\b' . || true
fi

echo "---- search for '9090' or '9191' ----"
if command -v rg >/dev/null 2>&1; then
  rg -n --hidden -S '\b9090\b|\b9191\b' -g '!*node_modules/**' -C3 || true
else
  grep -RIn --exclude-dir=node_modules -E '\b9090\b|\b9191\b' . || true
fi

echo "---- search README* for 9090/9191 ----"
if [ -e README* ]; then
  if command -v rg >/dev/null 2>&1; then
    rg -n --hidden -S '9090|9191' README* || true
  else
    grep -RIn --exclude-dir=node_modules -E '9090|9191' README* || true
  fi
else
  echo "No README files found at repo root."
fi

echo "---- show docs/evidence/config.alpha.yaml (first 200 lines) ----"
if [ -f docs/evidence/config.alpha.yaml ]; then
  sed -n '1,200p' docs/evidence/config.alpha.yaml
else
  echo "docs/evidence/config.alpha.yaml: not found"
fi

Length of output: 1883


Fix metrics port mismatch — align README (9090) and config (9191)

README documents Prometheus/health endpoints on localhost:9090 (README, lines 33, 37, 38, 51) but docs/evidence/config.alpha.yaml sets observability.metrics_port: 9191 — either set metrics_port to 9090 or update README/Helm to 9191.

Locations:

  • docs/evidence/config.alpha.yaml — observability.metrics_port (lines ~42-47)
  • README (root) — references to 9090 at lines 33, 37, 38, 51

Suggested quick fix:

-observability:
-  metrics_port: 9191
+observability:
+  metrics_port: 9090
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
observability:
metrics_port: 9191
log_level: "info"
tracing:
enabled: false
endpoint: ""
observability:
metrics_port: 9090
log_level: "info"
tracing:
enabled: false
endpoint: ""
🤖 Prompt for AI Agents
In docs/evidence/config.alpha.yaml around lines 42-47,
observability.metrics_port is set to 9191 which conflicts with README references
to 9090; change observability.metrics_port to 9090 to align with README (or if
the intended port is 9191, update the README and any Helm values/README
references at lines ~33,37,38,51 to 9191) — ensure the chosen port is
consistently reflected in docs/evidence/config.alpha.yaml, README, and Helm
values.

Loading