CHAPTER 32 · QUALITY & SHIPPING

Benchmarking & Profiling

Benchmarks are first-class citizens in Go, living right next to tests. go test -bench . gives you nanosecond-precision measurements; pprof tells you where time and memory go.

Learning objectives

  • Write and run benchmarks with testing.B.
  • Understand b.N and how Go calibrates iteration counts.
  • Report allocations per operation with b.ReportAllocs.
  • Compare two benchmark runs with benchstat.
  • Profile CPU and memory with pprof.

Writing a benchmark

func BenchmarkAdd(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = Add(2, 3)
    }
}

Rules (mirror tests): file ending in _test.go, function named BenchmarkXxx, single *testing.B.

Running benchmarks

go test -bench=. ./...              # run every benchmark
go test -bench=Add . -benchtime=2s  # run for 2 seconds per benchmark
go test -bench=. -count=5 .         # average over 5 runs (smooths noise)

Sample output:

BenchmarkAdd-10    1000000000    0.2570 ns/op
        //            ^^^^^^^^^^   ^^^^^^^^^^^^^
        //            b.N (iters)   ns per iteration

Understanding b.N

Go runs your loop with increasing b.N values until the benchmark runs long enough to get a stable measurement. Your job: put one operation you want to measure per iteration, and make sure setup work is outside the loop:

func BenchmarkJoin(b *testing.B) {
    words := strings.Split(strings.Repeat("hello ", 1000), " ")   // setup outside
    b.ResetTimer()                                                 // discount setup

    for i := 0; i < b.N; i++ {
        _ = strings.Join(words, "-")
    }
}

Reporting allocations

func BenchmarkStringConcat(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        s := ""
        for j := 0; j < 100; j++ { s += "x" }
    }
}

Output now includes bytes-per-op and allocs-per-op:

BenchmarkStringConcat-10   100000   12000 ns/op   5200 B/op   100 allocs/op

Allocations are often where performance goes. Reducing them is usually the #1 tuning lever.

Sub-benchmarks

func BenchmarkAppend(b *testing.B) {
    for _, size := range []int{10, 100, 1000, 10000} {
        b.Run(fmt.Sprintf("size-%d", size), func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                s := make([]int, 0, 0)
                for j := 0; j < size; j++ {
                    s = append(s, j)
                }
            }
        })
    }
}

benchstat: comparing runs

go install golang.org/x/perf/cmd/benchstat@latest

# before
go test -bench=. -count=10 . > before.txt

# make your change, then:
go test -bench=. -count=10 . > after.txt

benchstat before.txt after.txt

Output tells you the statistically significant change between runs (not just raw numbers, but p-values too). Essential for real performance work.

pprof

When a benchmark tells you something's slow, pprof tells you where:

# Capture CPU profile during benchmark
go test -bench=Work -cpuprofile=cpu.prof .

# Interactive explorer
go tool pprof cpu.prof
# > top
# > list FuncName
# > web          (opens a graph in your browser)

# Memory
go test -bench=Work -memprofile=mem.prof .
go tool pprof mem.prof

For a running HTTP server, import net/http/pprof (side effects auto-register /debug/pprof) and fetch profiles with go tool pprof http://localhost:8080/debug/pprof/profile.

Benchmarking tips

  • Close other apps. Background noise skews results.
  • Warm up the CPU. First iteration is often slower. -benchtime=2s or longer averages it out.
  • Prevent dead-code elimination. Assign results to a package-level var or use _ = result. The compiler may discard unused returns.
  • One thing at a time. Don't change code + env + tooling between before/after runs.
  • Measure, don't guess. Your intuition about perf is wrong more often than you'd like.

Check your understanding

Practice exercises

EXERCISE 1

Concat vs Builder

Write two benchmarks: one that builds a 1000-character string with +=, another with strings.Builder. Enable b.ReportAllocs() and compare.

Show solution
func BenchmarkConcat(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        s := ""
        for j := 0; j < 1000; j++ { s += "x" }
        _ = s
    }
}

func BenchmarkBuilder(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        var bd strings.Builder
        for j := 0; j < 1000; j++ { bd.WriteByte('x') }
        _ = bd.String()
    }
}

Expected: the Builder is 10–100x faster and does a tiny fraction of the allocations.

Further reading

Measure, don't guess.