Creating a Zero-Allocation Environment in Go

July 27, 2023
Rss Fetcher

Allocating dynamic memory is the thing everyone tries to avoid because it leads to performance degradation. Sometimes you can reorganize your logic to remove all the heap escapes from your code, but there are cases when it’s impossible to achieve such optimization. If that happens, you can try another method. Particularly, you can preallocate a sufficient amount of memory and request it on demand from a memory manager.

Standard Solution: sync.Pool

The standard library already has a primitive to save the pieces of dynamic memory allocated earlier during the application execution and request them back later. I’m talking about sync.Pool here. Unfortunately, the behaviour of that synchronization primitive is not consistent. Sometimes, it does the job; other times, it decides it shouldn’t. I needed something that worked for sure, so I created my own pool.

Linked List and Memory Allocation Amortization

Linked list is a good container for memory pool because it has a constant time of O(1) for removal and insertion operations. If I based the solution on an array list, it would take O(n) to insert a new element in the worst case because of copying the existing contents to a new array when the size is insufficient.

Go has an out-of-the-box linked list implementation in its standard library — the container/list package. But it has two downsides: The first is that it doesn’t utilize generics and thus accepts interface{} as an argument which leads to an unnecessary memory allocation. The second is that it dynamically allocates and removes nodes after the corresponding element is deleted. It doesn’t retain them in case they would be needed later. This leads to even more unnecessary memory allocations that could be avoided.

That’s why I came up with my linked list type:

package list

// List represents a doubly linked list.
// The zero value for List is an empty list ready to use.
type AmortizedList[T any] struct {
    root   Element[T]
    len    int        
    memory *List[T]
}

// Init initializes or clears list l.
func (l *AmortizedList[T]) Init() *AmortizedList[T] {
    l.root.next = &l.root
    l.root.prev = &l.root
    l.len = 0
    l.memory = New[T]()

    return l
}

// New returns an initialized list.
func NewAmortized[T any]() *AmortizedList[T] {
    return new(AmortizedList[T]).Init()
}

// Len returns the number of elements of list l.
// The complexity is O(1).
func (l *AmortizedList[T]) Len() int { return l.len }

// Front returns the first element of list l or nil if the list is empty.
func (l *AmortizedList[T]) Front() *Element[T] {
    if l.len == 0 {
        return nil
    }
    return l.root.next
}

// Back returns the last element of list l or nil if the list is empty.
func (l *AmortizedList[T]) Back() *Element[T] {
    if l.len == 0 {
        return nil
    }
    return l.root.prev
}

// lazyInit lazily initializes a zero List value.
func (l *AmortizedList[T]) lazyInit() {
    if l.root.next == nil {
        l.Init()
    }
}

// insert inserts e after at, increments l.len, and returns e.
func (l *AmortizedList[T]) insert(e, at *Element[T]) *Element[T] {
    e.prev = at
    e.next = at.next
    e.prev.next = e
    e.next.prev = e
    e.alist = l
    l.len++
    return e
}

// insertValue is a convenience wrapper for insert(&Element{Value: v}, at).
func (l *AmortizedList[T]) insertValue(v T, at *Element[T]) *Element[T] {
    lastSlot := l.memory.Back()

    if lastSlot == nil {
        return l.insert(&Element[T]{Value: v}, at)
    }

    l.memory.Remove(lastSlot)
    lastSlot.Value = v

    return l.insert(lastSlot, at)
}

// remove removes e from its list, decrements l.len
func (l *AmortizedList[T]) remove(e *Element[T]) {
    e.prev.next = e.next
    e.next.prev = e.prev
    e.next = nil // avoid memory leaks
    e.prev = nil // avoid memory leaks
    e.list = nil
    l.len--

    l.memory.insert(e, l.memory.root.prev)
}

// move moves e to next to at.
func (l *AmortizedList[T]) move(e, at *Element[T]) {
    if e == at {
        return
    }
    e.prev.next = e.next
    e.next.prev = e.prev

    e.prev = at
    e.next = at.next
    e.prev.next = e
    e.next.prev = e
}

// Remove removes e from l if e is an element of list l.
// It returns the element value e.Value.
// The element must not be nil.
func (l *AmortizedList[T]) Remove(e *Element[T]) T {
    if e.alist == l {
        // if e.list == l, l must have been initialized when e was inserted
        // in l or l == nil (e is a zero Element) and l.remove will crash
        l.remove(e)
    }
    return e.Value
}

// PushFront inserts a new element e with value v at the
// front of list l and returns e.
func (l *AmortizedList[T]) PushFront(v T) *Element[T] {
    l.lazyInit()
    return l.insertValue(v, &l.root)
}

// PushBack inserts a new element e with value v at the
// back of list l and returns e.
func (l *AmortizedList[T]) PushBack(v T) *Element[T] {
    l.lazyInit()
    return l.insertValue(v, l.root.prev)
}

As you can see, it’s based on Google’s implementation, the full source code is here. I used a new field, memory, to store the earlier produced nodes for the case if needed later. When a new value is added to the amortized list, a node for it is taken from the memory list. If it’s empty, a new node is created. If a value leaves the amortized list, the node that contained it goes to the memory list to reuse it.

The Memory Manager

All the elements placed in the memory pool must correspond to this interface:

package mempool

// Erasable is an object
// whose data can be completely
// nullified in order to reuse it.
type Erasable interface {
    // Erase resets all the fields
    // of the object to defaults
    // (not deeply).
    Erase() error
}

The Erase method is called upon adding an object to the pool. Also, you need to specify a constructor that will create new objects if the pool turns out to be empty when a Get request was made.

package mempool

import (
    "sync"

    ll "github.com/zergon321/ll"
)

// Pool holds dynamically
// allocated objects that
// can be reused throughout
// the runtime.
type Pool[T Erasable] struct {
    objects     *ll.List[T]
    mut         *sync.Mutex
    constructor func() T
}

// NewPool returns a new pool
// for a certain object type.
func NewPool[T Erasable](
    constructor func() T,
    options ...PoolOption[T],
) (*Pool[T], error) {
    var pool Pool[T]
    var params poolParams

    // Apply all the options.
    for i := 0; i < len(options); i++ {
        option := options[i]
        err := option(&pool, &params)

        if err != nil {
            return nil, err
        }
    }

    // Fill the pool.
    pool.constructor = constructor
    pool.objects = ll.New[T]()

    for i := 0; i < params.initLen; i++ {
        pool.objects.PushBack(pool.constructor())
    }

    return &pool, nil
}

We need a mutex because the calls to the pool could be made from multiple goroutine. Also, there is an option to preallocate a certain number of options, i.e., fill the pool preemptively. These options are defined as follows:

package mempool

import "sync"

// PoolOption changes something
// in the newly created pool.
type PoolOption[T Erasable] func(pool *Pool[T], params *poolParams) error

// poolParams holds the
// parameters to be used
// for pool initialization.
type poolParams struct {
    initLen int // initLen is the initial length the pool object slice.
}

// PoolOptionInitialLength creates an option
// to set the initial length of the object pool.
func PoolOptionInitialLength[T Erasable](length int) PoolOption[T] {
    return func(pool *Pool[T], params *poolParams) error {
        if length < 0 {
            return &ErrorNegativeLength{
                length: length,
            }
        }

        params.initLen = length
        return nil
    }
}

// PoolOptionConcurrent enables a newly
// created pool to gandle concurrent gets and puts
// from many different goroutines at the same time.
func PoolOptionConcurrent[T Erasable]() PoolOption[T] {
    return func(pool *Pool[T], params *poolParams) error {
        pool.mut = &sync.Mutex{}
        return nil
    }
}

Using the memory pool is easy. Just call Get anytime you want to create a new object and Put when you don’t need the object anymore and can dispose of it. Yeah, we kind of bring the C pipeline of working with dynamic memory to Go. We request more memory and then free it.

// Get extracts an empty object
// from the pool.
func (pool *Pool[T]) Get() T {
    if pool.objects.Len() <= 0 {
        return pool.constructor()
    }

    if pool.mut != nil {
        pool.mut.Lock()
        defer pool.mut.Unlock()
    }

    object := pool.objects.Remove(pool.objects.Back())

    return object
}

// Put puts the object to the pool
// and erases its fields so it can
// be reused.
func (pool *Pool[T]) Put(object T) error {
    if pool.mut != nil {
        pool.mut.Lock()
        defer pool.mut.Unlock()
    }

    err := object.Erase()

    if err != nil {
        return err
    }

    pool.objects.PushBack(object)

    return nil
}

Quite straightforward, isn’t it? Now let’s do some benchmarks.

Measuring Performance

First, let’s define our custom data type to work with:

type ByteArray []byte

func (array ByteArray) Erase() error {
    for i := 0; i < len(array); i++ {
        array[i] = 0
    }

    return nil
}

Now we need to see how much time it takes to populate a sync.Pool and extract values from it:

func BenchmarkSyncPool(b *testing.B) {
    pool := &sync.Pool{
        New: func() any { return make([]byte, 64) },
    }

    for i := 0; i < b.N; i++ {
        pool.Put(make([]byte, 64))
    }

    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        pool.Get()
    }
}

func BenchmarkSyncPoolFill(b *testing.B) {
    pool := &sync.Pool{
        New: func() any { return make([]byte, 64) },
    }

    for i := 0; i < b.N; i++ {
        pool.Put(make([]byte, 64))
    }
}

Then the same for our custom memory pool:

func BenchmarkMempool(b *testing.B) {
    pool, _ := mempool.NewPool[ByteArray](
        func() ByteArray { return make([]byte, 64) })

    for i := 0; i < b.N; i++ {
        pool.Put(make([]byte, 64))
    }

    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        pool.Get()
    }
}

func BenchmarkMempoolFill(b *testing.B) {
    pool, _ := mempool.NewPool[ByteArray](
        func() ByteArray { return make([]byte, 64) })

    for i := 0; i < b.N; i++ {
        pool.Put(make([]byte, 64))
    }
}

In addition, we need to test the amortization of dynamic memory allocations during the second time filling (i.e., when an object was extracted from the pool and then put back).

func BenchmarkMempoolRefill(b *testing.B) {
    pool, _ := mempool.NewPool[ByteArray](
        func() ByteArray { return make([]byte, 64) })

    for i := 0; i < b.N; i++ {
        pool.Put(make([]byte, 64))
    }

    data := make([]ByteArray, 0, b.N)

    for i := 0; i < b.N; i++ {
        data = append(data, pool.Get())
    }

    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        pool.Put(data[i])
    }
}

Time to see the magnificent results:

goos: linux
goarch: amd64
pkg: github.com/zergon321/mempool
cpu: AMD Ryzen 5 5600H with Radeon Graphics         
BenchmarkSyncPool-12          13307784        187.5 ns/op       87 B/op        1 allocs/op
BenchmarkSyncPoolFill-12      18251826         66.80 ns/op      117 B/op        2 allocs/op
BenchmarkMempool-12           95308918         30.18 ns/op        0 B/op        0 allocs/op
BenchmarkMempoolFill-12       13297759        106.2 ns/op      128 B/op        2 allocs/op
BenchmarkMempoolRefill-12     42304838         28.13 ns/op        0 B/op        0 allocs/op
PASS
ok   github.com/zergon321/mempool 29.185s

As you can see, the sync.Pool chose not to work, whereas our custom memory pool does the job. Moreover, it doesn’t allocate any memory if we don’t fill it beyond capacity (because of the amortized linked list behind the stage). So if we allocate enough memory in advance, there will be no allocations onward.

Here is the package source code.

Creating a Zero-Allocation Environment in Go was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.

Standard Solution: sync.Pool

Linked List and Memory Allocation Amortization

The Memory Manager

Measuring Performance

Previous Post

Next Post

Solutions

Regions Covered