-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
An internal Google program is executing code like the below and getting occasional runtime crashes during garbage collection:
func computeIncomingDependencies() (map[string]map[string]struct{}, error) {
gopath := os.Getenv("GOPATH")
if gopath == "" {
return nil, fmt.Errorf("GOPATH is not set")
}
dirs := strings.Split(gopath, ":")
allDirs := map[string]struct{}{}
for _, dir := range dirs {
if err := collectDirs(filepath.Join(dir, "src"), "", allDirs); err != nil {
return nil, err
}
}
allDeps := map[string]map[string]struct{}{}
for dir, _ := range allDirs {
allDeps[dir] = map[string]struct{}{} <<< GC during makemap on this line <<<
}
for dir, _ := range allDirs {
mode := build.ImportMode(0)
pkg, err := build.Import(dir, "", mode)
if err != nil {
fmt.Errorf("Import(%v, %v) failed: %v", dir, mode, err)
}
imports := pkg.Imports
if includeTestsFlag {
imports = append(imports, pkg.TestImports...)
}
for _, dep := range imports {
if deps, ok := allDeps[dep]; ok {
deps[dir] = struct{}{}
}
}
}
return allDeps, nil
}
A garbage collection happens on the marked line. During the scan of the stack frame corresponding to this function, the garbage collector finds an invalid heap pointer and crashes the program. The invalid heap pointer is at 0x1f8(SP). The map iterator for the loop being executed start at 0x1f0(SP), making this the second word in the iterator, it.value.
The error I am looking at says:
runtime: garbage collector found invalid heap pointer *(0xc20805ec20+0x198)=0xc2080f7000 span=0xc2080ee000-0xc2080f7000-0xc2080f8000 state=0
fatal error: invalid heap pointer
The actual stack frame is sp=0xc20805ebc0, giving the extra 0x60+0x198 = 0x1f8.
This is the generated code for the creation of allDeps and then that loop:
:118 0x45ebc1 e81a7bfaff CALL runtime.makemap(SB)
:118 0x45ebc6 488b5c2410 MOVQ 0x10(SP), BX
:118 0x45ebcb 48899c2480000000 MOVQ BX, 0x80(SP)
:119 0x45ebd3 488b4c2478 MOVQ 0x78(SP), CX
:119 0x45ebd8 488dbc24f0010000 LEAQ 0x1f0(SP), DI
:119 0x45ebe0 31c0 XORL AX, AX
:119 0x45ebe2 e865acfdff CALL 0x43984c
:119 0x45ebe7 488d1dd2fc1000 LEAQ 0x10fcd2(IP), BX
:119 0x45ebee 48891c24 MOVQ BX, 0(SP)
:119 0x45ebf2 48894c2408 MOVQ CX, 0x8(SP)
:119 0x45ebf7 488d9c24f0010000 LEAQ 0x1f0(SP), BX
:119 0x45ebff 48895c2410 MOVQ BX, 0x10(SP)
:119 0x45ec04 e8c790faff CALL runtime.mapiterinit(SB)
:119 0x45ec09 488b9c24f0010000 MOVQ 0x1f0(SP), BX
:119 0x45ec11 31ed XORL BP, BP
:119 0x45ec13 4839eb CMPQ BP, BX
:119 0x45ec16 0f84b4000000 JE 0x45ecd0
:119 0x45ec1c 488b9c24f0010000 MOVQ 0x1f0(SP), BX
:119 0x45ec24 4883fb00 CMPQ $0x0, BX
:119 0x45ec28 0f8431060000 JE 0x45f25f
:119 0x45ec2e 488b0b MOVQ 0(BX), CX
:119 0x45ec31 488b6b08 MOVQ 0x8(BX), BP
:120 0x45ec35 48898c24b8000000 MOVQ CX, 0xb8(SP)
:120 0x45ec3d 48898c2408010000 MOVQ CX, 0x108(SP)
:120 0x45ec45 4889ac24c0000000 MOVQ BP, 0xc0(SP)
:120 0x45ec4d 4889ac2410010000 MOVQ BP, 0x110(SP)
:120 0x45ec55 488d1d64fc1000 LEAQ 0x10fc64(IP), BX
:120 0x45ec5c 48891c24 MOVQ BX, 0(SP)
:120 0x45ec60 48c744240800000000 MOVQ $0x0, 0x8(SP)
:120 0x45ec69 e8727afaff CALL runtime.makemap(SB) <<< GC here <<<
:120 0x45ec6e 488b5c2410 MOVQ 0x10(SP), BX
:120 0x45ec73 48895c2470 MOVQ BX, 0x70(SP)
:120 0x45ec78 488d1dc1fa1000 LEAQ 0x10fac1(IP), BX
:120 0x45ec7f 48891c24 MOVQ BX, 0(SP)
:120 0x45ec83 488b9c2480000000 MOVQ 0x80(SP), BX
:120 0x45ec8b 48895c2408 MOVQ BX, 0x8(SP)
:120 0x45ec90 488d9c2408010000 LEAQ 0x108(SP), BX
:120 0x45ec98 48895c2410 MOVQ BX, 0x10(SP)
:120 0x45ec9d 488d5c2470 LEAQ 0x70(SP), BX
:120 0x45eca2 48895c2418 MOVQ BX, 0x18(SP)
:120 0x45eca7 e88486faff CALL runtime.mapassign1(SB)
:119 0x45ecac 488d9c24f0010000 LEAQ 0x1f0(SP), BX
:119 0x45ecb4 48891c24 MOVQ BX, 0(SP)
:119 0x45ecb8 e87392faff CALL runtime.mapiternext(SB)
:119 0x45ecbd 488b9c24f0010000 MOVQ 0x1f0(SP), BX
:119 0x45ecc5 31ed XORL BP, BP
:119 0x45ecc7 4839eb CMPQ BP, BX
:119 0x45ecca 0f854cffffff JNE 0x45ec1c
The bad pointer is 0xc2080f7000 and the span summary is 0xc2080ee000-0xc2080f7000-0xc2080f8000, meaning that s.limit == 0xc2080f7000. The conclusion seems to be that mapiternext (or mapiterinit, which calls mapiternext) can leave it.value pointing at the end of the underlying map data array.
I don't see how that can happen by reading mapiternext, but it seems to be possible somehow. Should probably fix for Go 1.4.1.