Skip to content

Sporadic error call on null client for first RPC call after upgrading from 3.0.0-alpha.30 to 3.0.1-alpha.1, ~1% chance #572

@mologie

Description

@mologie

After updating to 3.0.1-alpha.1 from 3.0.0-alpha.30, a load test of mine now fails occasionally with error schema/avas/corehost/corehost.capnp:CoreFactory.buildInfo: call on null client.

I am trying to build a minimal reproducer to narrow this down and post it here. (But that might take some time, I'm deferring to update for now.) In the meantime I'd leave the following details here in case someone else stumbles over the issue:

  • The environment is a Go HTTP server, which connects to a C++ capnp RPC server within a HTTP request.
  • 0.1-2% of requests fail with capnp 3.0.1-alpha.1 regardless for request rate or parallel requests (high variance in failure rate). More requests make it fail more often, linear correlation.
  • The affected code is heavily used in production with tens of thousands of daily connections, without issues with capnp 3.0.0-alpha.30.
  • No errors are logged to Logger.

The affected code is as simple as the following after a successful connection:

// (establishment + error checks for tcpConn omitted)
conn := rpc.NewConn(rpc.NewStreamTransport(tcpConn), &rpc.Options{
	Logger: &rpcErrorReporter{sess: s},
})
factory := corehost.CoreFactory(conn.Bootstrap(s.ctx))
buildInfoFuture, buildInfoRelease := factory.BuildInfo(s.ctx, nil)
defer buildInfoRelease()
buildInfo, err := buildInfoFuture.Struct() // err is not nil for 0.2-2% of requests

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions