Skip to content

libgit2: does not always seem to agree on host key while known_hosts is valid #397

@hiddeco

Description

@hiddeco

User on Slack reported that after an upgrade of their Flux components, the image-automation-controller (which at the moment still depends on the Git libraries from this controller, and recently started using libgit2 only), stopped working with the following error:

{"level":"error","ts":"2021-07-01T17:52:47.736Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"Reconciler error","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://[email protected]/repo.git', error: Certificate"}

Isolating the issue, we discovered that while the known_hosts entry in their Secret did contain a ssh-rsa item that matched the host key of the server, it resulted in a false mismatch.

Once the user had updated the known_hosts entry in the Secret with the output of ssh-keyscan example.com 2>/dev/null | base64 (containing a ssh-rsa and ssh-ed25519 item), the image-automation-controller started working again.

My educated guess is that something is not working correctly at all times in the custom bit of code we have for validating host keys with libgit2: https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/transport.go#L147-L239, as the error as logged by the controller matches the git2go.ErrCertificate returned by the certCallback.

Slack thread reference: https://cloud-native.slack.com/archives/CLAJ40HV3/p1625162540293300

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/gitGit related issues and pull requestsbugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions