Debuggers and Empathy - SoatDev IT Consulting

June 21, 2023
Rss Fetcher

In my social bubble, using debuggers regularly is popular. During development, we use it to see how the code behaves, what the call stack is, how arguments mutate, and how objects look mid-program. And I think it is a bad practice for several reasons.

My stance has a lot to do with how I learned to program. One of my favourite memories of early university years was the ACM International Collegiate Programming Contest.

You and two teammates spend about eight hours solving a number of algorithmic programming challenges — on a single computer without seeing the inputs your code is evaluated on. The typical way of working is dividing the problems among the team, and then each person cycles between three states: ideating a solution, implementing, and thinking about what went wrong after the submission failed.

Given that there is a single computer, only one person can do the implementing part — so you have to do the ideating and the post-submission analysis on paper. A printer is available, so typically, after your submission fails, you print your code and start “debugging.”

From today’s point of view, it looks quite archaic — but I’m quite happy I was able to develop this “in-my-head” debugging ability, or empathy with the code as I call it. This is not just about being unable to step in or put a breakpoint — it is also about coming up with reasons why the input you can’t see can break your code.

(Note for programmers: empathy is used similarly by the general population as well, but humans play the role of entities whose behaviour you try to understand without lifting the lid with some sort of debugger and fully controlling inputs).

Why is it useful? The thing is that you can’t always use debuggers.

If you are troubleshooting some server or client app that failed already, it’s outside of your control. You can’t turn back time.

All you have is the logs (if you are smart/lucky), the code, and your head. This also applies if you are working in a distributed/multi-threaded application. You can execute locally under your control, but debugging is not well-behaved. I used to write a lot of Spark jobs — and reasoning + logs was the only convenient tool at my disposal.

Again, the only thing you can do is to empathise and reason about the code and its states.

Here’s the last case to drive the nail home: I happen to do support for colleagues from time to time. They try to access some database, execute some utility, run a container … and it crashes, and I try to help them.

Can I run a debugger at their workstation? Hardly. Can I ask them to send me output, access the logs of the pods they submitted, look at the code in the branch they are using, etc.? Of course.

If you can empathise with the code, you can often solve the problem faster. Staring at the logs and coming up with the solution has a lower overhead than trying to simulate the situation locally with a debugger. Obtaining the state of an object from a debugger does not tell you why it ended up in that state, whereas understanding the situation does.

You can counterargue that by saying debuggers are hard to use in remote settings, but they can still be useful when developing. For example, in a language like Python, the absence of a compiler allows you to do many dynamic things, so it may not be obvious, when reading the code, what gets called.

I’ve even seen a guy write this comment, # step in here to see what gets called. It haunts me to this day like a particularly malicious or inept attempt to document code. My answer is that the existence of a debugger should not be used as an excuse for bad code structure and abuse of language’s looseness.

Another counterargument could be about needing to inspect objects during the program flow when they “can’t be logged.”

Using quotes suggests my counterargument — I don’t think there is a natural reason for most objects to be unloggable.

Multiple programming paradigms push you toward using plain objects (data classes, POJOs, structs, etc.), where the string representation and serde code can be automatically derived. Using that has many more benefits in tests, refactorability, and contract clarity.

Does it make debuggers completely useless? No — I do use them, e.g., in particularly quirky situations such as deadlocks that don’t manifest in locally-simulable single pod environments but happen only on clusters.

But I don’t “step into,” I just get a hang of the stack trace, identify where the lock is, etc. Using the debugger in this way is an important skill not to be shunned — but not to be taken over as a regular programming helper. Additionally, I do use REPL (or Jupyter) quite often (depending on the language, the quality of the documentation, and the task at hand), but that happens before I write the final code, when I’m exploring a library, etc. — or when I’m solving an ad hoc task such as data analysis.

The transition away from relying chiefly on logs may initially hurt and has its own dangers. You need to know what to log and when — and also when not to log. I had a cute experience with Argo Workflows recently. Turns out that while Kubernetes logging scales quite finely (as the logs pretty much reside on the node), Argo logging does not; it relies on etcd for message passing. And etcd is quite limited when it comes to message size (at least, compared to text stream to a local hard disk).

So I ended up thinking my job didn’t finish, when in reality, I didn’t trim my failure log message, and it was too large to get through. Only after I checked that the pod was already dead did I peek into logs of Argo itself and reasoned where the problem must be.

Knowing what to log is something I also tend to cover with this vaguely esoteric term of empathising with the code. Estimate where the code will hang/fail/misbehave due to network, business logic, lack of memory, etc., and log before it enters that place.

To do this, it’s important to use the necessary information about the context and make sure it’s reasonably small, so you won’t affect the logging itself. Then, do this again after it happens.

Add in the context — process number, time (or rather, all time information you have), batch id, request id, correlation id, and the volume of the operation that is about to commence. Log when unknown/dangerous input is about to be processed, and do it again to make sure it’s safe. Completing this in the following steps can help:
1. `about to process unknown input`
2. `the unknown input has size {byte_length_computation}`
3. `the unknown input starts with {convert_to_string_prefix}`

Add as many as you want. These detailed messages will help you pinpoint where the problem may lie if the log fails.

Debuggers and Empathy was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.

Previous Post

Next Post

Solutions

Regions Covered