Skip to content

runtime: crash near runtime.assertI2T2 including address 0xdeaddeaddeaddead #11643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhysh opened this issue Jul 9, 2015 · 12 comments
Closed
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Jul 9, 2015

With go version go1.5beta1 on linux/amd64, I see the following crash in an http server that serves short-lived requests. The process is stable with go1.4.1.

The appearance of 0xdeaddeaddeaddead makes me suspicious of memory corruption in the runtime. (@rsc)

unexpected fault address 0x0
fatal error: fault
[signal 0xb code=0x80 addr=0x0 pc=0x411393]

goroutine 727961 [running]:
runtime.throw(0xa45e70, 0x5)
    /usr/local/go/src/runtime/panic.go:527 +0x96 fp=0xc82027dcf8 sp=0xc82027dce0
runtime.sigpanic()
    /usr/local/go/src/runtime/sigpanic_unix.go:27 +0x2ab fp=0xc82027dd48 sp=0xc82027dcf8
runtime.assertI2T2(0x9ba380, 0xdeaddeaddeaddead, 0x0, 0xc82027de38, 0x8)
    /usr/local/go/src/runtime/iface.go:201 +0x33 fp=0xc82027dd68 sp=0xc82027dd48
redacted(0x0, 0x0)
    /redacted.go:30 +0x11c fp=0xc82027de58 sp=0xc82027dd68
redacted(0xc821b2f5e0, 0xc820057b00, 0xecd2ffd34, 0x32c0c225, 0xd42120, 0x0, 0xc82187bec0, 0xc820877aa0, 0xc82025e370, 0xc82187be60, ...)
    /redacted.go:87 +0x287 fp=0xc82027df40 sp=0xc82027de58
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc82027df48 sp=0xc82027df40
created by redacted
    /redacted.go:107 +0x252

goroutine 1 [IO wait]:
net.runtime_pollWait(0x7f61dc9381d8, 0x72, 0xc820012130)
    /usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0xc8201af1e0, 0x72, 0x0, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc8201af1e0, 0x0, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).accept(0xc8201af180, 0x0, 0x7f61de7807f0, 0xc821962080)
    /usr/local/go/src/net/fd_unix.go:408 +0x27c
net.(*TCPListener).AcceptTCP(0xc8200c80f0, 0xc8203b0090, 0x0, 0x0)
    /usr/local/go/src/net/tcpsock_posix.go:249 +0x4d
net.(*TCPListener).Accept(0xc8200c80f0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/tcpsock_posix.go:259 +0x3d
redacted(0x7f61de780738, 0xc8200c80f0, 0x0, 0x0, 0x0, 0x0)
    /redacted.go:26 +0x53
redacted(0xc8201fb330, 0x0, 0x0, 0x0, 0x0)
    <autogenerated>:3 +0xb0
net/http.(*Server).Serve(0xc820220480, 0x7f61dc93d868, 0xc8201fb330, 0x0, 0x0)
    /usr/local/go/src/net/http/server.go:1820 +0xb3
redacted(0xc820220420, 0x7f61dc93d868, 0xc8201fb330, 0x0, 0x0)
    /redacted.go:32 +0x1a7
redacted(0x7f61de780738, 0xc8200c80f0, 0x7f61dc93d840, 0xc8201b5380, 0x0, 0x0)
    /redacted.go:99 +0x85
main.main()
    /redacted.go:147 +0x12ea

[snip]
@ianlancetaylor ianlancetaylor added this to the Go1.5 milestone Jul 9, 2015
@ianlancetaylor
Copy link
Contributor

CC @RLH @aclements

@aclements
Copy link
Member

Hi @rhysh. A few questions:

Can you run this with GODEBUG=gccheckmark=1?

Can you say something about what's happening at the caller of assertI2T2 (/redacted.go:30 in your traceback)? In particular, I'm curious about where the value that you're type-asserting came from. I'm guessing it came from the second word of a struct, and that we failed to mark that containing object, but it would be nice to know if that's right and anything you can say about that containing object.

Finally, is this the same program as #11644?

@rsc
Copy link
Contributor

rsc commented Jul 10, 2015

Same questions as Austin. The line in question says x.(T), and it looks like x came from near the beginning of an allocated struct (specifically, starting at the second word of the struct). That allocated struct containing the interface value x was prematurely collected by the garbage collector. The question is why. How long has the containing struct existed? Where do the references to it live? How often are those places updated? Any information like that would help.

Thanks.

@rsc
Copy link
Contributor

rsc commented Jul 10, 2015

Rhys and I looked a little bit into this at Gophercon.

That bad error value may have come from a struct { uintptr; error } that was sent over an unbuffered channel of the same. Maybe we are still screwing up the stack barriers or the write barriers there?

Edit: update to reflect some uncertanity. We traced through a lot of error results to find this, so maybe we went down the wrong code path, but it seems to match the evidence so far.

@aclements
Copy link
Member

aclements commented Jul 10, 2015 via email

@rhysh
Copy link
Contributor Author

rhysh commented Jul 10, 2015

This is the same program as #11644. I'll see if it's reproducible with gcstackbarrieroff=1 (and then also try gccheckmark=1).

I've reduced it to the following test, which crashes on darwin/amd64 with GOMAXPROCS=4 after 5-10 seconds:

package main

import (
    "runtime"
    "sync"
)

type response struct {
    a bool
    b int
    c string
    d []interface{}
}

type myError struct {
    a int
    b string
}

func (myError) Error() string { return "" }
func doRequest() (*response, error) {
    type async struct {
        resp *response
        err  error
    }
    ch := make(chan *async, 0)
    done := make(chan struct{}, 0)
    cancel := make(chan struct{}, 0)

    go func() {
        select {
        case ch <- &async{resp: nil, err: myError{}}:
        case <-done:
        }
    }()

    select {
    case r := <-ch:
        runtime.Gosched()
        return r.resp, r.err
    case <-cancel:
        panic("unreachable")
    }
}

func main() {
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            var garbage []byte
            for j := 0; j < 1e9; j++ {

                // BEGIN
                func() error {
                    resp, err := doRequest()
                    if err != nil {
                        if merr, ok := err.(myError); ok { // <- this is the line that calls runtime.assertI2T2
                            _ = merr
                            return nil
                        }
                        return err
                    }
                    _ = resp
                    return nil
                }()
                // END

                garbage = make([]byte, 1<<10)
            }
            _ = garbage
        }()
    }
    wg.Wait()
}
unexpected fault address 0xb01dfacedebac1e
fatal error: fault
[signal 0xb code=0x1 addr=0xb01dfacedebac1e pc=0x9053]

goroutine 125 [running]:
runtime.throw(0x72e50, 0x5)
    /usr/local/go/src/runtime/panic.go:527 +0x96 fp=0x820771e58 sp=0x820771e40
runtime.sigpanic()
    /usr/local/go/src/runtime/sigpanic_unix.go:27 +0x2ba fp=0x820771ea8 sp=0x820771e58
runtime.assertI2T2(0x66c20, 0xdeaddeaddeaddead, 0x820e79940, 0x820771f28, 0x20c01)
    /usr/local/go/src/runtime/iface.go:201 +0x33 fp=0x820771ec8 sp=0x820771ea8
main.main.func1.1(0x0, 0x0)
    /Users/rhysh/Downloads/lucky.go:59 +0x99 fp=0x820771f48 sp=0x820771ec8
main.main.func1(0x820164000)
    /Users/rhysh/Downloads/lucky.go:67 +0x6f fp=0x820771fa8 sp=0x820771f48
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0x820771fb0 sp=0x820771fa8
created by main.main
    /Users/rhysh/Downloads/lucky.go:73 +0x7e

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x82016400c)
    /usr/local/go/src/runtime/sema.go:43 +0x26
sync.(*WaitGroup).Wait(0x820164000)
    /usr/local/go/src/sync/waitgroup.go:126 +0xb4
main.main()
    /Users/rhysh/Downloads/lucky.go:75 +0xaf

[snip]

And with GODEBUG=gcstackbarrieroff=1,gccheckmark=1,gctrace=1 go run -gcflags '-N -l' ~/Downloads/lucky.go, I see the following crash:

gc #1403 @9.184s 17%: 0.082+0.45+2.1+0.46+0.80 ms clock, 0.24+0.45+0+0/0.34/0+2.4 ms cpu, 4->5->0 MB, 4 MB goal, 4 P
gc #1404 @9.192s 17%: 0.028+0.21+1.5+0.70+1.1 ms clock, 0.084+0.21+0+0/0.50/0+3.4 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
gc #1405 @9.200s 17%: 0.064+2.7+0.026+0.61+0.72 ms clock, 0.19+2.7+0+0/0.42/0+2.1 ms cpu, 4->5->0 MB, 4 MB goal, 4 P
runtime:greyobject: checkmarks finds unexpected unmarked object obj=0x820fefac0
runtime: found obj at *(0x82047adf8+0x20)
base=0x82047adf8 k=0x41023d s.start*_PageSize=0x82047a000 s.limit=0x820456000 s.sizeclass=0 s.elemsize=0
obj=0x820fefac0 k=0x4107f7 s.start*_PageSize=0x820fee000 s.limit=0x820ff0000 s.sizeclass=3 s.elemsize=32
 *(obj+0) = 0x0
 *(obj+8) = 0x88202a2028
 *(obj+16) = 0x820fefb60
 *(obj+24) = 0x0
fatal error: checkmark found unmarked object

runtime stack:
runtime.throw(0x80460, 0x1f)
    /usr/local/go/src/runtime/panic.go:527 +0x96 fp=0x820157918 sp=0x820157900
runtime.greyobject(0x820fefac0, 0x82047adf8, 0x20, 0x820078029, 0x8800000000, 0x8820372f50, 0x82010d220)
    /usr/local/go/src/runtime/mgcmark.go:867 +0x204 fp=0x8201579a0 sp=0x820157918
runtime.scanblock(0x82047adf8, 0xc8, 0x86ec0, 0x82010d220)
    /usr/local/go/src/runtime/mgcmark.go:755 +0x152 fp=0x820157a28 sp=0x8201579a0
runtime.scanframeworker(0x820157be0, 0x0, 0x82010d220)
    /usr/local/go/src/runtime/mgcmark.go:424 +0x14c fp=0x820157ac0 sp=0x820157a28
runtime.scanstack.func1(0x820157be0, 0x0, 0x1)
    /usr/local/go/src/runtime/mgcmark.go:344 +0x65 fp=0x820157b08 sp=0x820157ac0
runtime.gentraceback(0x24d59, 0x82047abe0, 0x0, 0x82019c780, 0x0, 0x0, 0x7fffffff, 0x820157d00, 0x0, 0x0, ...)
    /usr/local/go/src/runtime/traceback.go:336 +0xa7e fp=0x820157c38 sp=0x820157b08
runtime.scanstack(0x82019c780)
    /usr/local/go/src/runtime/mgcmark.go:366 +0x391 fp=0x820157d40 sp=0x820157c38
runtime.scang(0x82019c780)
    /usr/local/go/src/runtime/proc1.go:417 +0x96 fp=0x820157d60 sp=0x820157d40
runtime.markroot(0x82010a000, 0x4c)
    /usr/local/go/src/runtime/mgcmark.go:135 +0x1ba fp=0x820157e00 sp=0x820157d60
runtime.parfordo(0x82010a000)
    /usr/local/go/src/runtime/parfor.go:110 +0x1d4 fp=0x820157e88 sp=0x820157e00
runtime.gchelper()
    /usr/local/go/src/runtime/mgc.go:1617 +0x59 fp=0x820157ed0 sp=0x820157e88
runtime.stopm()
    /usr/local/go/src/runtime/proc1.go:1125 +0x15e fp=0x820157ef8 sp=0x820157ed0
runtime.gcstopm()
    /usr/local/go/src/runtime/proc1.go:1315 +0xfe fp=0x820157f28 sp=0x820157ef8
runtime.schedule()
    /usr/local/go/src/runtime/proc1.go:1574 +0xa5 fp=0x820157f60 sp=0x820157f28
runtime.goschedImpl(0x820172600)
    /usr/local/go/src/runtime/proc1.go:1691 +0x136 fp=0x820157f78 sp=0x820157f60
runtime.gosched_m(0x820172600)
    /usr/local/go/src/runtime/proc1.go:1699 +0x32 fp=0x820157f88 sp=0x820157f78
runtime.mcall(0x0)
    /usr/local/go/src/runtime/asm_amd64.s:204 +0x5e fp=0x820157f98 sp=0x820157f88

goroutine 1 [semacquire]:
runtime.gopark(0x8ab80, 0xbc8c0, 0x763f0, 0xa, 0x19, 0x4)
    /usr/local/go/src/runtime/proc.go:185 +0x169 fp=0x8201a7e40 sp=0x8201a7e18
runtime.goparkunlock(0xbc8c0, 0x763f0, 0xa, 0x21d19, 0x4)
    /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0x8201a7e78 sp=0x8201a7e40
runtime.semacquire(0x82016400c, 0x51701)
    /usr/local/go/src/runtime/sema.go:100 +0x2fa fp=0x8201a7ed8 sp=0x8201a7e78
sync.runtime_Semacquire(0x82016400c)
    /usr/local/go/src/runtime/sema.go:43 +0x26 fp=0x8201a7ef0 sp=0x8201a7ed8
sync.(*WaitGroup).Wait(0x820164000)
    /usr/local/go/src/sync/waitgroup.go:126 +0xb4 fp=0x8201a7f38 sp=0x8201a7ef0
main.main()
    /Users/rhysh/Downloads/lucky.go:75 +0xaf fp=0x8201a7f70 sp=0x8201a7f38
runtime.main()
    /usr/local/go/src/runtime/proc.go:111 +0x2bf fp=0x8201a7fc0 sp=0x8201a7f70
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0x8201a7fc8 sp=0x8201a7fc0

goroutine 2 [force gc (idle)]:
runtime.gopark(0x8ab80, 0xb6b70, 0x75240, 0xf, 0x14, 0x1)
    /usr/local/go/src/runtime/proc.go:185 +0x169 fp=0x820118758 sp=0x820118730
runtime.goparkunlock(0xb6b70, 0x75240, 0xf, 0x8200f0114, 0x1)
    /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0x820118790 sp=0x820118758
runtime.forcegchelper()
    /usr/local/go/src/runtime/proc.go:152 +0xb8 fp=0x8201187c0 sp=0x820118790
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0x8201187c8 sp=0x8201187c0
created by runtime.init.4
    /usr/local/go/src/runtime/proc.go:141 +0x2b

goroutine 17 [GC sweep wait]:
runtime.gopark(0x8ab80, 0xb6c40, 0x74520, 0xd, 0x44214, 0x1)
    /usr/local/go/src/runtime/proc.go:185 +0x169 fp=0x820114748 sp=0x820114720
runtime.goparkunlock(0xb6c40, 0x74520, 0xd, 0x14, 0x1)
    /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0x820114780 sp=0x820114748
runtime.bgsweep(0x82015c000)
    /usr/local/go/src/runtime/mgcsweep.go:67 +0x14d fp=0x8201147b8 sp=0x820114780
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0x8201147c0 sp=0x8201147b8
created by runtime.gcenable
    /usr/local/go/src/runtime/mgc.go:200 +0x53

goroutine 18 [runnable]:
runtime.Gosched()
    /usr/local/go/src/runtime/proc.go:166 +0x14 fp=0x820780dd0 sp=0x820780dc0
main.doRequest(0x0, 0x0, 0x0)
    /Users/rhysh/Downloads/lucky.go:39 +0x144 fp=0x820780ec8 sp=0x820780dd0
main.main.func1.1(0x0, 0x0)
    /Users/rhysh/Downloads/lucky.go:57 +0x2e fp=0x820780f48 sp=0x820780ec8
main.main.func1(0x820164000)
    /Users/rhysh/Downloads/lucky.go:67 +0x6f fp=0x820780fa8 sp=0x820780f48
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0x820780fb0 sp=0x820780fa8
created by main.main
    /Users/rhysh/Downloads/lucky.go:73 +0x7e

goroutine 19 [runnable]:
runtime.Gosched()
    /usr/local/go/src/runtime/proc.go:166 +0x14 fp=0x8204d3dd0 sp=0x8204d3dc0
main.doRequest(0x0, 0x0, 0x0)
    /Users/rhysh/Downloads/lucky.go:39 +0x144 fp=0x8204d3ec8 sp=0x8204d3dd0
main.main.func1.1(0x0, 0x0)
    /Users/rhysh/Downloads/lucky.go:57 +0x2e fp=0x8204d3f48 sp=0x8204d3ec8
main.main.func1(0x820164000)
    /Users/rhysh/Downloads/lucky.go:67 +0x6f fp=0x8204d3fa8 sp=0x8204d3f48
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0x8204d3fb0 sp=0x8204d3fa8
created by main.main
    /Users/rhysh/Downloads/lucky.go:73 +0x7e


[snip]
$ go version
go version devel +b6ead9f Tue Jul 7 21:53:11 2015 +0000 darwin/amd64
$ uname -a
Darwin Rhyss-MacBook-Pro.local 14.4.0 Darwin Kernel Version 14.4.0: Thu May 28 11:35:04 PDT 2015; root:xnu-2782.30.5~1/RELEASE_X86_64 x86_64

@rsc
Copy link
Contributor

rsc commented Jul 11, 2015

I can reproduce the crash on my Macbook Pro with the sample program. Thanks so much for boiling it down.

@rhysh
Copy link
Contributor Author

rhysh commented Jul 11, 2015

I've further reduced the doRequest function to the following:

func doRequest() (*response, error) {
    type async struct {
        resp *response
        err  error
    }
    ch := make(chan *async, 0)

    go func() {
        select {
        case ch <- &async{resp: nil, err: myError{}}:
        case <-(chan struct{})(nil):
        }
    }()

    r := <-ch
    runtime.Gosched()
    return r.resp, r.err
}

The send is still in a multi-case select block (although only one of the cases involves a non-nil channel at runtime), but the receive is now just a basic blocking receive.

I wonder if the problematic sequence of events is as follows:

  1. GC begins, scans stack of receiving goroutine
  2. sending goroutine writes the struct pointer into a sudoG
  3. sending goroutine enqueues the sudoG on the channel's sendq, does not mark (something) as dirty
  4. sender goroutine exits
  5. GC mark phase completes
  6. GC sweep reclaims struct memory
  7. receiving goroutine wakes up, grabs the sudoG from the channel's sendq, successfully reads a pointer to reclaimed memory
  8. receiving goroutine passes the dangling pointer to something that later uses it
  9. BOOM

@aclements
Copy link
Member

Thanks for the test case! I can reproduce this on linux/amd64 as well. I'm looking at it now.

@rsc
Copy link
Contributor

rsc commented Jul 13, 2015

Back online after the weekend. I have a fix pending, Austin.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/12086 mentions this issue.

@rsc rsc closed this as completed in 8c3533c Jul 13, 2015
@rhysh
Copy link
Contributor Author

rhysh commented Jul 13, 2015

I haven't seen this since updating to from go1.5beta1 to devel +0bcdffe Mon Jul 13 19:42:35 2015 +0000. Thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants