Skip to content

runtime: bad pointer due to map iteration #9384

@rsc

Description

@rsc

An internal Google program is executing code like the below and getting occasional runtime crashes during garbage collection:

func computeIncomingDependencies() (map[string]map[string]struct{}, error) {
    gopath := os.Getenv("GOPATH")
    if gopath == "" {
        return nil, fmt.Errorf("GOPATH is not set")
    }
    dirs := strings.Split(gopath, ":")
    allDirs := map[string]struct{}{}
    for _, dir := range dirs {
        if err := collectDirs(filepath.Join(dir, "src"), "", allDirs); err != nil {
            return nil, err
        }
    }
    allDeps := map[string]map[string]struct{}{}
    for dir, _ := range allDirs {
        allDeps[dir] = map[string]struct{}{} <<< GC during makemap on this line <<<
    }
    for dir, _ := range allDirs {
        mode := build.ImportMode(0)
        pkg, err := build.Import(dir, "", mode)
        if err != nil {
            fmt.Errorf("Import(%v, %v) failed: %v", dir, mode, err)
        }
        imports := pkg.Imports
        if includeTestsFlag {
            imports = append(imports, pkg.TestImports...)
        }
        for _, dep := range imports {
            if deps, ok := allDeps[dep]; ok {
                deps[dir] = struct{}{}
            }
        }
    }
    return allDeps, nil
}

A garbage collection happens on the marked line. During the scan of the stack frame corresponding to this function, the garbage collector finds an invalid heap pointer and crashes the program. The invalid heap pointer is at 0x1f8(SP). The map iterator for the loop being executed start at 0x1f0(SP), making this the second word in the iterator, it.value.

The error I am looking at says:

runtime: garbage collector found invalid heap pointer *(0xc20805ec20+0x198)=0xc2080f7000 span=0xc2080ee000-0xc2080f7000-0xc2080f8000 state=0
fatal error: invalid heap pointer

The actual stack frame is sp=0xc20805ebc0, giving the extra 0x60+0x198 = 0x1f8.

This is the generated code for the creation of allDeps and then that loop:

:118    0x45ebc1    e81a7bfaff          CALL runtime.makemap(SB)
:118    0x45ebc6    488b5c2410          MOVQ 0x10(SP), BX
:118    0x45ebcb    48899c2480000000        MOVQ BX, 0x80(SP)
:119    0x45ebd3    488b4c2478          MOVQ 0x78(SP), CX
:119    0x45ebd8    488dbc24f0010000        LEAQ 0x1f0(SP), DI
:119    0x45ebe0    31c0                XORL AX, AX
:119    0x45ebe2    e865acfdff          CALL 0x43984c
:119    0x45ebe7    488d1dd2fc1000          LEAQ 0x10fcd2(IP), BX
:119    0x45ebee    48891c24            MOVQ BX, 0(SP)
:119    0x45ebf2    48894c2408          MOVQ CX, 0x8(SP)
:119    0x45ebf7    488d9c24f0010000        LEAQ 0x1f0(SP), BX
:119    0x45ebff    48895c2410          MOVQ BX, 0x10(SP)
:119    0x45ec04    e8c790faff          CALL runtime.mapiterinit(SB)
:119    0x45ec09    488b9c24f0010000        MOVQ 0x1f0(SP), BX
:119    0x45ec11    31ed                XORL BP, BP
:119    0x45ec13    4839eb              CMPQ BP, BX
:119    0x45ec16    0f84b4000000            JE 0x45ecd0
:119    0x45ec1c    488b9c24f0010000        MOVQ 0x1f0(SP), BX
:119    0x45ec24    4883fb00            CMPQ $0x0, BX
:119    0x45ec28    0f8431060000            JE 0x45f25f
:119    0x45ec2e    488b0b              MOVQ 0(BX), CX
:119    0x45ec31    488b6b08            MOVQ 0x8(BX), BP
:120    0x45ec35    48898c24b8000000        MOVQ CX, 0xb8(SP)
:120    0x45ec3d    48898c2408010000        MOVQ CX, 0x108(SP)
:120    0x45ec45    4889ac24c0000000        MOVQ BP, 0xc0(SP)
:120    0x45ec4d    4889ac2410010000        MOVQ BP, 0x110(SP)
:120    0x45ec55    488d1d64fc1000          LEAQ 0x10fc64(IP), BX
:120    0x45ec5c    48891c24            MOVQ BX, 0(SP)
:120    0x45ec60    48c744240800000000      MOVQ $0x0, 0x8(SP)
:120    0x45ec69    e8727afaff          CALL runtime.makemap(SB) <<< GC here <<<
:120    0x45ec6e    488b5c2410          MOVQ 0x10(SP), BX
:120    0x45ec73    48895c2470          MOVQ BX, 0x70(SP)
:120    0x45ec78    488d1dc1fa1000          LEAQ 0x10fac1(IP), BX
:120    0x45ec7f    48891c24            MOVQ BX, 0(SP)
:120    0x45ec83    488b9c2480000000        MOVQ 0x80(SP), BX
:120    0x45ec8b    48895c2408          MOVQ BX, 0x8(SP)
:120    0x45ec90    488d9c2408010000        LEAQ 0x108(SP), BX
:120    0x45ec98    48895c2410          MOVQ BX, 0x10(SP)
:120    0x45ec9d    488d5c2470          LEAQ 0x70(SP), BX
:120    0x45eca2    48895c2418          MOVQ BX, 0x18(SP)
:120    0x45eca7    e88486faff          CALL runtime.mapassign1(SB)
:119    0x45ecac    488d9c24f0010000        LEAQ 0x1f0(SP), BX
:119    0x45ecb4    48891c24            MOVQ BX, 0(SP)
:119    0x45ecb8    e87392faff          CALL runtime.mapiternext(SB)
:119    0x45ecbd    488b9c24f0010000        MOVQ 0x1f0(SP), BX
:119    0x45ecc5    31ed                XORL BP, BP
:119    0x45ecc7    4839eb              CMPQ BP, BX
:119    0x45ecca    0f854cffffff            JNE 0x45ec1c

The bad pointer is 0xc2080f7000 and the span summary is 0xc2080ee000-0xc2080f7000-0xc2080f8000, meaning that s.limit == 0xc2080f7000. The conclusion seems to be that mapiternext (or mapiterinit, which calls mapiternext) can leave it.value pointing at the end of the underlying map data array.

I don't see how that can happen by reading mapiternext, but it seems to be possible somehow. Should probably fix for Go 1.4.1.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions