Skip to content

"integer divide by zero" when dialing #401

@cannium

Description

@cannium

Describe the bug
Got this panic stack(caller stack omitted) in some chaos tests:

panic: runtime error: integer divide by zero

goroutine 310 [running]:
github.com/cloudwego/netpoll.(*roundRobinLB).Pick(0xc001280001?)
        /go/pkg/mod/github.com/cloudwego/[email protected]/poll_loadbalance.go:90 +0x6b
github.com/cloudwego/netpoll.(*manager).Pick(0xc0003dccf0)
        /go/pkg/mod/github.com/cloudwego/[email protected]/poll_manager.go:151 +0x79
github.com/cloudwego/netpoll.newPollDesc(0x77)
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_polldesc.go:26 +0x36
github.com/cloudwego/netpoll.(*netFD).connect(0xc0003828c0, {0x15eef38, 0xc0003827e0}, {0xc0003828c0?, 0xc0012baa08?}, {0x15d8b40?, 0xc0012800e0?})
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_netfd.go:134 +0xf9
github.com/cloudwego/netpoll.(*netFD).dial(0xc0003828c0, {0x15eef38, 0xc0003827e0}, {0x15f4780?, 0x0?}, {0x15f4780?, 0xc0005cc450?})
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_netfd.go:84 +0x155
github.com/cloudwego/netpoll.socket({0x15eef38, 0xc0003827e0}, {0x142a6fd, 0x3}, 0x2, 0x1, 0xc00011ac40?, 0x0, {0x15f4780, 0x0}, ...)
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_sock.go:119 +0x145
github.com/cloudwego/netpoll.internetSocket({0x15eef38, 0xc0003827e0}, {0x142a6fd, 0x3}, {0x15f4780, 0x0}, {0x15f4780, 0xc0005cc450}, 0x1, 0x0, ...)
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_sock.go:47 +0xdc
github.com/cloudwego/netpoll.(*sysDialer).dialTCP(0xc0012bac30, {0x15eef38, 0xc0003827e0}, 0x0, 0xc0005cc450)
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_tcpsock.go:178 +0x94
github.com/cloudwego/netpoll.DialTCP({0x15eef38, 0xc0003827e0}, {0x142a6fd, 0x3}, 0x0, 0xc0005cc450)
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_tcpsock.go:170 +0x20f
github.com/cloudwego/netpoll.(*dialer).dialTCP(0x15eee58?, {0x15eef38, 0xc0003827e0}, {0x142a6fd, 0x3}, {0xc0005922c0?, 0x0?})
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_dialer.go:116 +0x338
github.com/cloudwego/netpoll.(*dialer).DialConnection(0x20b7f60?, {0x142a6fd, 0x3}, {0xc0005922c0, 0x10}, 0x2540be400?)
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_dialer.go:72 +0x125
github.com/cloudwego/netpoll.DialConnection(...)
        /go/pkg/mod/github.com/cloudwego/[email protected]/net_dialer.go:28

I skimmed the code and guess the return value of m.Run() is not checked, it failed and closed m, so m.balance.Pick() panicked.

netpoll/poll_manager.go

Lines 143 to 152 in b0bf57d

// adjust polls
// m.Run() will finish very quickly, so will not many goroutines block on Pick.
_ = m.Run()
//nolint:staticcheck // SA9003: empty branch
if !atomic.CompareAndSwapInt32(&m.status, managerInitializing, managerInitialized) {
// SetNumLoops called during m.Run() which cause CAS failed
// The polls will be adjusted next Pick
}
return m.balance.Pick()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions