Skip to content

Conversation

@wdberkeley
Copy link
Owner

Add retry logic to reconciler's metastore add_objects calls to handle transport errors. Previously, the reconciler would immediately fail and abandon the reconciliation round on any metastore error.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.1.x
  • v24.3.x
  • v24.2.x

Release Notes

  • none

Change add_objects, replace_objects, and commit_objects method
signatures to take const refs instead of taking unique_ptr by value.
This enables retry scenarios where the builder needs to remain usable
after the call.

The replicated metastore's release method is no longer used and is
removed. The simple metastore still uses this method for testing, so it
remains.
They aren't needed for metastore add_objects, replace_objects, and
commit_objects calls, and are potentially confusing.
…errors

Add retry logic to reconciler's metastore add_objects calls to handle
transport errors. Previously, the reconciler would immediately fail and
abandon the reconciliation round on any metastore error.
@wdberkeley wdberkeley force-pushed the retry-builder-metastore branch 2 times, most recently from eebea5e to 7f5a333 Compare September 24, 2025 16:47
wdberkeley pushed a commit that referenced this pull request Nov 11, 2025
Fix a leak when the keytab cannot be found.

```
Direct leak of 120 byte(s) in 3 object(s) allocated from:
    #0 0x58cc434ea154 in malloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3
    #1 0x736a590858ad in krb5_build_principal_alloc_va ../../.././src/lib/krb5/krb/bld_princ.c:124:9
    #2 0x736a590858ad in krb5_build_principal ../../.././src/lib/krb5/krb/bld_princ.c:148:14
    redpanda-data#3 0x736a595ae86a in kg_acceptor_princ ../../.././src/lib/gssapi/krb5/naming_exts.c:165:12
    redpanda-data#4 0x736a59560092 in acquire_accept_cred ../../.././src/lib/gssapi/krb5/acquire_cred.c:199:16
    redpanda-data#5 0x736a59560092 in acquire_cred_context ../../.././src/lib/gssapi/krb5/acquire_cred.c:845:15
    redpanda-data#6 0x736a5955f43d in acquire_cred_from ../../.././src/lib/gssapi/krb5/acquire_cred.c:1320:11
    redpanda-data#7 0x736a5955ed49 in krb5_gss_acquire_cred_from ../../.././src/lib/gssapi/krb5/acquire_cred.c:1348:12
    redpanda-data#8 0x736a594fa4c4 in gss_add_cred_from ../../.././src/lib/gssapi/mechglue/g_acquire_cred.c:544:11
    redpanda-data#9 0x736a594f9361 in gss_acquire_cred_from ../../.././src/lib/gssapi/mechglue/g_acquire_cred.c:190:10
    redpanda-data#10 0x736a595cde33 in get_available_mechs ../../.././src/lib/gssapi/spnego/spnego_mech.c:3109:18
    redpanda-data#11 0x736a595cd788 in spnego_gss_acquire_cred_from ../../.././src/lib/gssapi/spnego/spnego_mech.c:377:11
    redpanda-data#12 0x736a594fa4c4 in gss_add_cred_from ../../.././src/lib/gssapi/mechglue/g_acquire_cred.c:544:11
    redpanda-data#13 0x736a594f9361 in gss_acquire_cred_from ../../.././src/lib/gssapi/mechglue/g_acquire_cred.c:190:10
    redpanda-data#14 0x58cc523cf7de in security::gssapi_authenticator::impl::init() src/v/security/gssapi_authenticator.cc:293:20
```

The associated log line is:
```
INFO  2025-11-10 12:29:46,682 security - gssapi_authenticator.cc:71 - GSS_API error gss init failed to acquire credentials for principal redpanda in keytab /var/lib/redpanda/redpanda.keytab: No credentials were supplied, or the credentials were unavailable or inaccessible
```

Signed-off-by: Ben Pope <[email protected]>
wdberkeley added a commit that referenced this pull request Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants