Three Stages Of CPU Usage Grief - SoatDev IT Consulting

September 1, 2023
Rss Fetcher

This article is about an often misunderstood system resource — CPU time. As opposed to memory, which can be easily measured and attributed to computer processes, things are not so simple regarding the execution time of said processes. Besides explaining how CPU time is consumed, we will delve into the causes of excessive usage, how it can be mitigated, and what you can do if it is truly unavoidable.

I’m here to convince you that a seemingly trivial issue of excessive usage has practical consequences. You will see how one badly written microservice can affect the rest of your architecture components, which share resources. Worse yet, you can be one careless line of code away from selling your house to Amazon just because your app was hoarding CPU time throughout the night. At the very least, consider the damage to the environment.

Objective Facts for a Change

Processes are made up of threads. Processes merely represent programs in execution, and threads are the ones that the processor is executing. For a thread to run, it needs to be scheduled for execution on a processor core.

A CPU cycle is the time required to perform a simple processor operation. Processor clock speed is the number of cycles it can perform per second, appropriately measured in Hertz. Threads are just that — sets of operations which are to be performed. Therefore, higher clock speeds mean more work your threads can handle within a second. What’s left is to decide how many cycles each thread gets and, more importantly, which threads require these cycles in the first place. I’ll explain.

Let’s take an HTTP server as an example. When it receives a request, it handles it, requiring the processor’s attention. When our server is not busy handling requests, it is idling. There is no point in scheduling such a process when it does absolutely nothing, which brings us to our final point:

Resource usage is natural. Needless resource usage is bad. Not a hot take; we’re still talking objective facts. It does not matter whether your application performs well 100% of the time — if it’s needlessly hogging system resources, it can potentially prevent other processes from receiving them.

With that out of the way, let me introduce the three stages of CPU usage grief.

Stage 1: Denial (Naming the Core Issues)

You may disagree, but some high CPU use cases are inexcusable. Case in point, we will see some examples of faulty code snippets and how we can mitigate their issues.

The source code for the upcoming snippets can be found here. To run and easily display the resulting behavior of running these snippets, we will run a Docker container using a prebuilt image that already includes the source code.

You do not need to actually run anything. I’ve already worked hard to make this article as visually understandable as possible. With that said, if at any point you feel as if I’m taking you for an absolute buffoon, feel free to run the related commands and see the outcome for yourself.

Tight loops

Here is a code snippet for a program that performs a computation over and over:

"""
Simplest tight loop which does absolutely nothing.
"""


def do_work():
    _ = 1 + 1


def problem():
    while True:
        do_work()

You may justify this code (hence the denial stage) — your do_work might need to poll for a change in a file, constantly update a state of some kind, or anything really. Unsurprisingly, it isn’t very CPU-friendly, as you can see below:

# runs the snippet in a container
docker run --rm --name cpu-hogging -it vladpbr/cpu-hogging:python -c "from samples.tight_loop import problem; problem()"

# displays resource statistics for the container (run in a separate terminal)
docker stats cpu-hogging

That’s a lot of CPU time for a program that does nothing

To clarify, the CPU’s 99% metric means the program utilizes 99% of a single CPU core.

A quick primer on threads, if you will: a single thread can only be scheduled for execution on a single processor core at a time, meaning as long as you’re not spawning multiple threads, CPU usage could only amount to a single core.

Conversely, because we are using Python, I have lied to your face. To regain your trust, I’ll explain why: due to Python’s Global Interpreter Lock, true parallelism will never happen in a Python process, as a single interpreter can only be controlled by a single thread at a time. Even if we used multiple threads, they would never be executed concurrently. This is what makes the difference between the following cases:

"""
Tight loop in two separate threads/processes.
"""

from multiprocessing import Process
from threading import Thread


def do_work():
    _ = 1 + 1


def problem():
    while True:
        do_work()


def spawn(items):
    for item in items:
        item.start()

    for item in items:
        item.join()


def spawn_threads():
    spawn([Thread(target=problem) for _ in range(2)])


def spawn_processes():
    spawn([Process(target=problem) for _ in range(2)])

docker run --rm --name cpu-hogging -it vladpbr/cpu-hogging:python -c "from samples.tight_loop_concurrent import spawn_threads; spawn_threads()"

Single interpreter — zero concurrency. Hooray for PEP 703

docker run --rm --name cpu-hogging -it vladpbr/cpu-hogging:python -c "from samples.tight_loop_concurrent import spawn_processes; spawn_processes()"

In hopes of further redeeming myself, here are two threads in Golang:

package main

func do_work() {
  for true {}
}

func main() {
  go do_work()
  do_work()
}

docker run --rm --name cpu-hogging -it vladpbr/cpu-hogging:golang run samples/tight_loop.go

Pay no attention to the PIDS column; it’s even more misleading than I am

The conclusion is that our process receives all the CPU time it can get its greedy hands on. Looking at the code, it’s obvious — our program does pure computational work without ever sleeping or waiting for I/O. This is called a tight loop —“a loop of code that executes without releasing any resources to other programs or the operating system.” However, as is also true with people, having a title does not make them any better.

Look, I’m not saying these loops are always bad — they are inherently CPU-cache friendly and are widely used in video game rendering. What I am saying is that you should consider the following solution:

def solution():
    import time
    while True:
        do_work()
        time.sleep(0.001)

docker run --rm --name cpu-hogging -it vladpbr/cpu-hogging:python -c "from samples.tight_loop import solution; solution()"

Simply adding a sleep time of one millisecond in between loop iterations fixes the issue. You’ve got to ask yourself — do you really need to poll for anything more often than that?

Busy waits

Let’s talk about why the time.sleep solution works and why the following snippet does not:

"""
Loop which uses a custom `sleep` function.
"""

from datetime import datetime, timedelta


def do_work():
    _ = 1 + 1


def sleep(seconds: float):
    end_time = datetime.now() + timedelta(seconds=seconds)

    while datetime.now() < end_time:
        pass


def problem():
    while True:
        do_work()
        sleep(0.001)

We have implemented our own sleep function, which calculates the exact time it needs to “sleep” and then sits in a loop until the time has passed. Here is how it performs:

docker run --rm --name cpu-hogging -it vladpbr/cpu-hogging:python -c "from samples.busy_wait import problem; problem()"

If it wasn’t immediately obvious, our custom sleep function is effectively a tight loop in and of itself. This is what’s called a busy wait — a loop that does not let code advance until a condition is met. So what does the time.sleep function do any different that makes it all work?

Time to do that thing you dread doing. Let’s read some documentation:

This is different from busy waiting — time.sleep notifies the operating system that our thread is inactive. Here is a snippet from the Wikipedia article on the “sleep” system call:

A computer program (process, task, or thread) may sleep, which places it into an inactive state for a period of time. Eventually the expiration of an interval timer, or the receipt of a signal or interrupt causes the program to resume execution.

We need to notify the operating system about our sleeping process because it excludes our thread from being scheduled for execution. As long as our thread is not being scheduled, it does not receive any CPU time. This time can then be allocated to other processes that need it.

Other ways of releasing resources

So, the core issue here is that greedy processes need to release resources for others to use. Sleeping is one way of doing so, but there are others.

Your do_work might need to interact with local files or remote servers, asking the underlying operating system to perform these actions. This does not happen instantaneously, as the operating system must coordinate these actions with the actual machine hardware.

Reading from a hard drive, for example, requires the hard disk itself to move its reader-writer head to the appropriate position and read the bits off of the plates. Until the action is performed, your thread can not progress and, therefore, will not be scheduled.

Do not simply trust me on my word this time; instead, let the behavior of the following snippet convince you:

def do_disk_io_work():
    result = 1 + 1
    with open("temp.txt", "w") as _file:
        _file.write(str(result))


def disk_io():
    while True:
        do_disk_io_work()

docker run --rm --name cpu-hogging -it vladpbr/cpu-hogging:python -c "from samples.tight_loop import disk_io; disk_io()"

The same goes for network I/O, as you can tell:

def do_network_io_work():
    import http.client
    client = http.client.HTTPSConnection("google.com")
    client.request("GET", "/")
    client.getresponse()


def network_io():
    while True:
        do_network_io_work()

docker run --rm --name cpu-hogging -it vladpbr/cpu-hogging:python -c "from samples.tight_loop import network_io; network_io()"

What’s funky about these examples is that their behavior will differ on your system, as it depends greatly on your hardware and physical location. Your hard drive might be faster than mine, performing write operations within a smaller time frame, leading to less time for I/O operations. Similarly, if you are geographically closer to Google servers than me (and you probably are), the response will arrive faster, leading to less time spent in a waiting state.

For comparison, here is me DOS-ing my home router

The router itself is geographically situated 50cm away from me

To state the obvious, this is not a complete solution. You might be tempted to leave this code as-is since it does not guzzle CPU time as much, but the underlying issue is still there. Just because the loop is “looser” does not mean it can not be improved upon.

Avoiding CPU hogging in other common scenarios

When using queues, you might be tempted to do the following:

"""
Waiting for results from a queue.
"""

from time import sleep
from threading import Thread
from queue import Queue
from random import choice

queue = Queue(maxsize=1)


def producer_thread():
    while True:
        queue.put(choice(["apple", "banana", "cherry"]))
        sleep(1)


def problem():
    Thread(target=producer_thread).start()
    while True:
        if not queue.empty():
            print(queue.get())

As long as there are no items in our queue, we’ve got ourselves a tight loop that looks like this:

docker run --rm --name cpu-hogging -it vladpbr/cpu-hogging:python -c "from samples.queue import problem; problem()"

Queues have a mechanism for that — by default, get() is a blocking call, meaning it will not return until an item is added to the queue. The underlying item arrival notification mechanism is based on locks and does not waste precious CPU time.

As a matter of fact, any self-respecting, god-fearing data structure will handle our use case and not bother the processor when it is not actually required. If you encounter a problem with another data structure, take a deep breath and consult the relevant documentation.

Stage 2: Bargaining (What You Shouldn’t Do)

At this point, you could argue that the issue is that a process is using too many resources and should be throttled. I’m a relatively reasonable person, so I will show you how it can be done, and only then will I dunk on it.

You can always set a hard limit on how much CPU a process can use. Such solutions usually do not come built-in, but external ones do exist — BES for Windows and cpulimit for Linux, to name a few. Docker comes with the ability to set a precise CPU limit for any container via cgroups:

docker run --cpus 0.5 --rm --name cpu-hogging -it vladpbr/cpu-hogging:python -c "from samples.tight_loop import problem; problem()"

Two problems with this “solution”:

The core issue still needs to be resolved. Congratulations, your code still smells, and the thread running it will also be scheduled at inconsistent times.
You are using an inherently infrastructural mechanism to solve an application-level problem. Hard limits are meant to act as safeguards, not solutions.

Stage 3: Acceptance (Legitimate CPU-Related Issues)

We now agree that the code is at fault here. Some of it can be refactored to release resources once in a while, but there are scenarios when your code really is optimized, yet there isn’t enough CPU time to go around for everyone else. If that is the case, you may avoid leveraging scheduling priority in your favor.

Modern operating systems implement advanced CPU scheduling algorithms that consider how important the process is relative to other existing processes when deciding on who gets the execution time. This importance is based on process priority — an integer value given to a process during its creation, which can also be changed at runtime.

You could argue that, similarly to setting a hard CPU limit, this is a function of infrastructure and, therefore, can not be a viable application-level solution. I disagree. We’re talking about a specific use case where the infrastructure is at “fault,” so to say, for not having enough resources. When push comes to shove, application developers are the only people who can decide on their processes’ importance and priority.

Process priority in practice

Let’s say our tight loop sample does actual work. Assuming you are using Linux, here is how you launch a process with custom priority (in Linux, priority values range between 19 and -20, where the value is lower priority and the default is 0):

nice -n 15 python3.11 -c "from samples.tight_loop import problem; problem()" &

Ampersand, at the end, runs the process separately from the command line and spits out the process ID

Let’s see how it performs:

top -p 34376

It seems like the priority setting did not help because our machine had unused cores to give to all the processes, including ours.

top

Our machine has more than enough CPU time to give (based on %Cpu(s) value)

This superuser.com response has put it quite elegantly:

Suppose you have a “go to the head of the line” card for the grocery store. You go to the store, fill your cart, go to the checkout counters, and find there’s no one in line. Does your card help you get checked out faster? Nope.

Priorities don’t affect processing speed, in that a higher priority process doesn’t get to run faster or even to use more CPU time… not if it’s the only thing that wants to use the CPU.

Do not forget to game-end our rogue process with this code:

kill -9 34376

Process priority in a CPU-limited environment

As we’ve stated above, the problem resides in environments that do not have enough processing time to go around. Therefore, we need an environment where CPU time is scarce to see this behavior in action and show how process priority can mitigate the issue. For this purpose, I have gone out of my way to spin up a single-core TinyCore virtual machine (as I was too lazy to download another heavy Linux distro) using VirtualBox:

All running processes will be fighting over a single core

Perhaps a bit out of scope for this article, but here is how you get Python and the relevant code in TinyCore:

tce-load -wi python3.9 git
git clone https://github.com/vlad-pbr/cpu-hogging.git
cd cpu-hogging

TinyCore comes with its own package management tools, which can be used to install prepackaged software via tce-load
As of writing, the highest Python version available in official TinyCore package repositories is 3.9
TinyCore also does not come with ssh installed, so https cloning will do

For starters, let’s see how two tight loops behave when being executed with the same priority:

python3.9 -c "from samples.tight_loop import problem; problem()" &
python3.9 -c "from samples.tight_loop import problem; problem()" &
top

Roughly the same amount of CPU time for each

Now, let’s lower the priority for one of our tight loops:

renice -n 5 2591
top

Priority matters when there isn’t enough CPU time to spare. If adding more resources to your environment is not an option, prioritizing more important processes can be.

kill -9 2592 2591

Conclusion

As far as the main topic is concerned, case closed. But, in truth, an additional goal of this article was to widen your scope when it comes to such infrastructural issues. To investigate the whole CPU time shebang, we happened to do a whole bunch of other things:

Gave names to bad coding practices
Brushed up on Python concurrency
Delved into process scheduling mechanics
Ran Docker containers and viewed their statistics
Learned quite a bit about Linux and some common bash commands
Booted an entire virtual machine and simulated a lack of CPU time!?

As much as we like to separate infrastructure from the application (judging by the memes), IT is inescapably interconnected. Knowing the works of your underlying infrastructure lets you write better code, making you a better programmer. Knowing the nitpicks of programming makes you a somewhat less angry system administrator.

Finally, talking about a wide range of topics is way more fun than something super specific and close-ended. I mean, do you really, truly learn something by reading an article that only shows cause and effect? Neither do I.

Three Stages Of CPU Usage Grief was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.