Skip to content

Conversation

@pjbgf
Copy link
Member

@pjbgf pjbgf commented Jun 8, 2022

The average SubTransport lifecycle encompass two Actions calls. Previously,
it was attempted to share the same connection across both calls. That did
not work as some Git Servers do not support multiple sessions from the same
connection. The implementation was not fully transitioned into the
"one connection per action" model, which led to connections being leaked.

The transition to RW mutex was to avoid the unnecessary blocking in the
goroutine at the start of the second action call.

It is worth mentioning that now when the context is done, the client level
resources (connection) will also be freed. This ensures that SSH connections
will not outlive the subtransport.

Relates to fluxcd/image-automation-controller#334

The average SubTransport lifecycle encompass two Actions calls. Previously,
it was attempted to share the same connection across both calls. That did
not work as some Git Servers do not support multiple sessions from the same
connection. The implementation was not fully transitioned into the
"one connection per action" model, which led to connection being leaked.

The transition to RW mutex was to avoid the unnecessary blocking in the
goroutine at the start of the second action call.

It is worth mentioning that now when the context is done,  the client level
resources (connection) will also be freed. This ensures that SSH connections
will not outlive the subtransport.

Signed-off-by: Paulo Gomes <[email protected]>
@pjbgf pjbgf added the area/git Git related issues and pull requests label Jun 8, 2022
@pjbgf pjbgf added this to the GA milestone Jun 8, 2022
@pjbgf pjbgf mentioned this pull request Jun 8, 2022
@pjbgf
Copy link
Member Author

pjbgf commented Jun 8, 2022

Testing against IAC the goroutine numbers kept at low (75-100) and healthy levels (as opposed to the previous 1500+):
image

Copy link
Contributor

@darkowlzz darkowlzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks.

@pjbgf pjbgf merged commit 1faa547 into fluxcd:main Jun 9, 2022
@pjbgf pjbgf deleted the leak-conns branch June 9, 2022 07:54
@kallaics
Copy link

Hi @pjbgf ,
could you please release this fix quickly our system cannot handle the lot of connections and our Git is drop all of the connections 2-3 times per day. We are on 031.1 currently, but as I see well, unfortunately this fix is not included.
Csaba

@pjbgf
Copy link
Member Author

pjbgf commented Jun 13, 2022

@kallaics we have a release candidate from Friday with this fix and the logging improvements (#778):

ghcr.io/fluxcd/source-controller:rc-b877bc21
ghcr.io/fluxcd/image-automation-controller:rc-843074dd

@kallaics
Copy link

@pjbgf The patched images are working well, thanks.

@pjbgf pjbgf mentioned this pull request Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/git Git related issues and pull requests

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants