Published on 1/4/2023

Use sync.WaitGroup cautiously in Go

sync.WaitGroup is a convenient and popular means of waiting for a collection of goroutines to finish executing, which is often used when the program needs to wait for several concurrent operations to complete before proceeding. It can be too limited beyond simpler tasks..

sync.WaitGroup is very useful

sync.WaitGroup is a synchronization mechanism in Go that allows one to wait for a collection of goroutines to finish executing. This is particularly useful when you have multiple concurrent operations running in goroutines and you need to wait for all of them to complete before proceeding with the rest of your code. WaitGroup is part of the sync package. For JS-ers, it’s similar to Promise.all().

A WaitGroup waits for a collection of goroutines to finish. The main goroutine calls Add to set the number of goroutines to wait for. Then each of the goroutines runs and calls Done when completed. At the same time, Wait can be used to block until all goroutines have finished (i.e., when the counter reaches zero).

package main

import (
"fmt"
"sync"
"time"
)

func main() {
// Declare a WaitGroup
var wg sync.WaitGroup

// Launch several goroutines and increment the WaitGroup counter for each
for i := 0; i < 3; i++ {
    wg.Add(1) // Increment the counter
    go func(i int) {
        defer wg.Done() // Decrement the counter when the goroutine completes
        fmt.Println("Starting goroutine", i)
        time.Sleep(2 * time.Second) // Simulate some work
        fmt.Println("Goroutine", i, "has finished processing")
    }(i)
}

// Wait for all goroutines to complete
wg.Wait()
fmt.Println("All goroutines have finished processing")
}

In this example, we’re starting three goroutines, each simulating work with a 2-second sleep. The main function waits for all goroutines to finish before printing the final statement and exiting.

sync.WaitGroup is best used when…

  1. Simple synchronization requirement: You have several goroutines doing work that is independent of each other, and you need to wait until they are all complete before moving on. This is common in scenarios where parallel processing is used for efficiency, like processing multiple files, making several network requests, etc.

  2. Conciseness and clarity: You want to keep your code clean and readable without the complexity of channels or other synchronization techniques. The sync.WaitGroup provides a clear and concise way to wait for goroutines to complete, which can be more readable than using channels in situations where the main concern is just waiting for goroutines to finish.

  3. Avoiding manual synchronization: Manually using flags, condition variables, or atomic counters to wait for goroutines can make code complex and error-prone. sync.WaitGroup abstracts this complexity and ensures synchronization is handled correctly.

Overall, WaitGroups are a simpler alternative compared to more complex synchronization mechanisms like channels or mutexes when the goal is solely to wait for the completion of goroutines.

Common pitfalls with sync.WaitGroup

Using sync.WaitGroup in Go can be very straightforward, but there are common traps that developers may fall into. Below are some of these pitfalls, along with sample code to illustrate each.

Not calling Add before launching the goroutine

Programmers sometimes launch goroutines before adding to the wait group’s counter, which can cause a race condition where Wait returns before some goroutines even start.

var wg sync.WaitGroup

for i := 0; i < 3; i++ {
    go func(i int) {
        defer wg.Done() // Decrements the counter
        fmt.Println("Goroutine", i)
    }(i)
    wg.Add(1) // Should be called before starting the goroutine
}

wg.Wait() // Might return early before some goroutines start

The correct approach is to call Add outside and before the goroutine.

var wg sync.WaitGroup

for i := 0; i < 3; i++ {
    wg.Add(1) // This should happen before launching the goroutine
    go func(i int) {
        defer wg.Done()
        fmt.Println("Goroutine", i)
    }(i)
}

wg.Wait() // Will wait for all goroutines to complete

Misusing Done and causing a negative counter

If Done is called more times than Add, the WaitGroup’s internal counter goes negative, and Go’s panic is triggered:

panic: sync: negative WaitGroup counter

var wg sync.WaitGroup

wg.Add(1)
go func() {
    wg.Done()
    wg.Done() // Error: second Done without matching Add
}()

wg.Wait() // Will panic: sync: negative WaitGroup counter

The right way is ensuring each Done corresponds to an Add.

Forgetting to handle panics and missing Done calls

If a panic happens within a goroutine and it’s not recovered, the corresponding Done may never be called, causing the main function to failed due to the panic.

var wg sync.WaitGroup

wg.Add(1)
go func() {
    panic("something bad happened") // The panic is not recovered, Done is not called
    wg.Done() // Never reached
}()

wg.Wait() 

To handle this, you should recover from panics to ensure Done is called.

var wg sync.WaitGroup

wg.Add(1)
go func() {
    defer wg.Done() // Using defer ensures Done is called even if a panic occurs
    defer func() {
        if r := recover(); r != nil {
            fmt.Println("Recovered in goroutine:", r)
        }
    }()
    panic("something bad happened")
}()

wg.Wait() // Won't be stuck, as Done is called in the deferred function

Reusing a WaitGroup improperly

Once you’ve used a WaitGroup, you should avoid reusing it unless it’s reset to its zero state (all goroutines have called Done and Wait has been called). Misuse can lead to subtle race conditions, as explained in Go’s own documentation:

Note that calls with a positive delta that occur when the counter is zero must happen before a Wait. Calls with a negative delta, or calls with a positive delta that start when the counter is greater than zero, may happen at any time. Typically this means the calls to Add should execute before the statement creating the goroutine or other event to be waited for. If a WaitGroup is reused to wait for several independent sets of events, new Add calls must happen after all previous Wait calls have returned.

var wg sync.WaitGroup

// First batch of goroutines
for i := 0; i < 3; i++ {
    wg.Add(1)
    go func(i int) {
        defer wg.Done()
        fmt.Println("First batch:", i)
    }(i)
}

wg.Wait()

// Reusing the WaitGroup without resetting
// Second batch of goroutines
for i := 0; i < 3; i++ {
    wg.Add(1) // This might be racing with Wait from the first batch
    go func(i int) {
        defer wg.Done()
        fmt.Println("Second batch:", i)
    }(i)
}

wg.Wait() // Potential for race conditions here

Ideally, avoid reusing WaitGroups or ensure they are completely reset and no longer in use before reuse.

By understanding these common pitfalls and how to address them, developers can avoid bugs and race conditions associated with improper use of sync.WaitGroup.

When to avoid using sync.WaitGroup

There are certain scenarios where using a sync.WaitGroup might be inappropriate or overkill. Here’s when you might consider alternatives:

  1. Complex synchronization needs: If your application requires more than just waiting—for example, if you need to synchronize the operation of goroutines in a more sophisticated way, like enforcing a certain execution order, you might need more advanced tools like channels, mutexes, or conditional variables.

  2. Error handling: sync.WaitGroup doesn’t provide built-in support for handling errors from multiple goroutines. If you need to propagate errors from any of the worker goroutines, you’ll need to implement a more complex error-handling strategy, possibly using channels or the errgroup package, which can handle errors in addition to synchronization.

  3. Context cancellation: If you need to be able to cancel goroutines, for example, due to a timeout or because one goroutine encountered an error, you might need to use the context package. The sync.WaitGroup doesn’t have built-in support for context-based cancellation and cannot stop the goroutines it’s waiting on.

  4. Resource constraints: In scenarios where the number of goroutines you want to run can exhaust system resources (like file handles, memory, etc.), or if the goroutines are long-lived, using a simple WaitGroup can be problematic. Instead, you might need a worker pool or semaphore to limit concurrency, which can help manage resources more effectively.

  5. Dynamic collection of goroutines: If you have a dynamic collection of goroutines where the exact number of goroutines is not known when you start waiting, managing the counter of a sync.WaitGroup correctly can become challenging. Misuse can easily lead to deadlocks or race conditions. In such a case, other signaling mechanisms like channels might be safer and more flexible.

  6. Single goroutine completion notification: If your application logic requires knowing when each individual goroutine completes, as opposed to when all complete, a sync.WaitGroup is not the right tool. You would likely need to use channels, possibly in combination with select, to receive individual completion notifications.

  7. Data aggregation: If you need to collect and aggregate data from multiple goroutines, a sync.WaitGroup doesn’t facilitate the safe sharing of state. In such a case, you’ll need additional synchronization structures (like mutexes) to protect the shared state, or you can use channels to collect the results.

Understanding your program’s requirements will help you choose the right synchronization strategy. While sync.WaitGroup is a powerful tool for its designed purpose, Go’s concurrency model provides several other primitives that can be more appropriate for scenarios requiring more than just waiting for a collection of goroutines to finish.