The case of the subtle race condition

Yesterday I got a strange race condition reported after I wrote some additional tests for pgzip. At first I couldn’t really work out why I was getting the race reported, and I suspected that the race detector was reporting wrong line numbers. The race reported was boiled down like this:

type Writer struct {
    prevTail []byte
    pushedErr chan error
}

func (z *Writer) A() {
    go func() {
        for {
            // ...
            err := SomeCall()
            if err != nil {
                z.pushError(err)   //  <<<--- READ
                return
            }
        }
    }()
}

func (z *Writer) B() {
    // ...
    z.prevTail = c   // <<<--- WRITE
}

func (z Writer) pushError(err error) {
    z.pushedErr <- err
    close(z.pushedErr)
}

It calls A() and afterwards B(), and the goroutine in A is running while B is called.

It seemed strange. The weird thing was, that I wasn't reading anything related to the write on the line. The err variable was local for the goroutine. In case the function I called was inlined, I also checked the pushError function, but it only did "safe" channel operations.

The z.prevTail is not being accessed outside the calling goroutine, so this all seemed very strange to me.

However, the most experienced of you are probably already screaming the solution: We are calling the pushError() receiver function with Writer as a value instead of a pointer. When we call a receiver with a value we must remember that it implies a copy, which in term implies a read of all member values.

So the race detector was (of course) correct, however it was quite tricky to see the actual race, since it was hidden in a receiver copy. Unfortunately it is another blow for value receivers. I try to be 'nice' and do value receivers when i can, but sometimes it is tempting to make all of them pointer receivers.