When teams build software and run it in production they need to divide their time between these responsibilities. How teams spend their time is a good indicator of team and system health. Most teams overestimate their capacity for feature development, causing tech debt to grow. Monitoring how time is spent helps to address this problem.
Categorizing time spent
When I look at how teams spend their time I mostly care about two dimensions: Planned versus unplanned work and functional versus non functional work. Combining these creates a quadrant with four categories for what a product development team spends time on.
The sweet spot is the planned functional work. This is the work that is taken into a sprint or planning cycle deliberately and that will change the behavior of the product. It includes feature development and project support with all the activities that fall under the team’s Definition of Done; such as writing tests and updating documentation.
The team will also spend planned time on non-functional work. Those things do not directly change the behavior of the product, but are important for its quality; such as upgrades, refactoring, pruning alerts, addressing performance and security concerns.
When running services in production unplanned work is unavoidable. Healthy teams protect their planning cycle by dedicating engineers to operational work. This improves predictability of committed deliverables.
The functional unplanned work includes tasks like helping a new client onboard to a service, providing third line support on an operational issue and unblocking other teams.
Unplanned non-functional work is dealing with urgent bugs, outages and other production issues. It should be kept low, but we may not want to drive it down to zero. It becomes hard to innovate if we do not accept risk. The risk appetite depends on the business, but too much risk avoidance will stifle innovation and make the product team ineffective.
Who’s in control?
I see planned work as the team being in control of their fate. Unplanned work means the team is being driven by external factors. Only a team that is in control can deliver durable business value.
Adding value
Another way to look at the quadrant is in terms of business value.
A team exists only because we expect it to deliver value to the business. The most direct way to do so is to plan and deliver new, needed functionality. Addressing operational and non functional concerns delivers value indirectly and dealing with bugs and outages prevents loss of value. It’s always urgent, but it does not contribute to business growth.
Time spent and team health
A healthy team has time for feature development and maintains this situation by structurally looking after product quality. Not much time is spent on unplanned work.
An unhealthy team, incurs technical debt, which over time increases time spent on unplanned work.
Losing health
Most teams and most managers heavily overestimate their capacity for feature development. This comes from a good place: optimism and a desire to deliver value. But it sets the team up for a hard time.
Underestimating unplanned work creates technical debt.
I think that underestimating unplanned work creates technical debt. The mechanism behind this is:
- The team overestimates its capacity for feature development and over-commits to its stakeholders.
- When reality inevitably happens the team tries to keep its deliverables on track. Because it is not in control of unplanned work it will compromise on the only thing left to compromise on: technical quality. Now the team is increasing its technical debt.
- When technical debt increases the time that needs to be spent on unplanned work increases. Because teams will try to continue to deliver business value it becomes even harder to improve technical quality.
We’ll fix it later
It may be a good strategy for the team to go into crunch mode for a quarter to deliver something critical and accept tech debt to grow, as long as it is paid off later. But in reality this rarely happens. If our organization is designed properly teams have an important mission and a quarter where there is less to do will never occur.
I need more people
Engineering managers that face increasing tech debt generally ask for more people. But the fundamental problem here is prioritization, not capacity. If the prioritization problem is not solved, adding more people will not bring the team to a healthy state. New capacity is absorbed easily by more feature development. And growing the team creates more overhead, more context switching and more effort to align.
The problem is prioritization not capacity.
What good looks like
If I know nothing else about a team I will assume it spends no more than 50% of its time on feature development. A team that is very healthy might drive it up to 60% while spending a good chunk of time on technical quality.
Teams differ, and so the picture of what a healthy team looks like differs as well. At the end of this write up I’ve listed some of the things that drive those differences.
Creating visibility
I think that lack of visibility is the core of the problem. Tech debt builds up gradually; what we add over a few sprints is really not going to make much of a difference in the team’s velocity. This makes the problem hard to detect. If we want to address it effectively we need to begin by making it visible. For that we need to know our current state and we need to understand how it develops over time.
First we’ll need to build our technical quality backlog. We could monitor how it develops over time and leave it at that, but I think that is not enough.
Using the quadrant
The fact that we have tech debt is normal and not directly relevant to the business. But the business will care about its effect on reliability and our ability to get things done.
The fact that we have tech debt is normal and not directly relevant to the business.
It can be hard to explain why spending time on refactoring is a good investment. When we have the data to demonstrate how much time the team loses on bug fixing and support the conversation about technical quality becomes easier. RFO and SLO data should further support this conversation.
The quadrant will also help us determine what investments have the most impact and track their effect. If we want to reduce operational toil we may have to invest in tooling. And when we address technical quality we should see unplanned non functional work decrease.
Understanding unplanned work helps the team to give reliable commitments.
Understanding the amount of unplanned work will also allow the team to better understand the capacity it has for planned work and give more reliable commitments.
Measuring time spent
Measuring time spent should not create another administrative burden. When there are tickets for all tasks and tasks are estimated it can come nearly for free. All that is needed is to add categorization, which can be done by adding a field to a ticketing system, assigning tickets to epics, or by using tags.
You could argue that this tracks time estimated rather than time spent, but for our purpose that really doesn’t matter much. Being off a couple of story points or dev days, or whatever the team uses to estimate is not going to radically skew the picture.
It also does not matter much if teams estimate differently. Comparing absolutes between teams is pretty useless anyway and the percentages are a much better indicator of what is going on than absolute values.
Appendix: Driving the difference
Teams differ, and so the picture of what a healthy team looks like differs. Here are some of the drivers for those differences.
Use cases
The amount and complexity of use cases the team supports is important for the amount of unplanned functional work. Every use case has its edge cases, exceptions and potential issues and drives the need for support.
Service age
Services accumulate business logic and operational complexity over time. Old business logic, even when it is well maintained, is harder to support because the team will be more remote from its implementation and will have less context. Support for older code will be more time consuming.
Tech stack
Every technology that the team works with will require planned, non functional work to maintain. A more varied tech stack creates more non-functional overhead.
Varied stakeholders
Serving a varied group of stakeholders requires more support capacity than serving a homogenous stakeholder group.
Tech debt
Tech debt makes bugs and production issues more likely. Lack of tooling and client documentation should also be seen as tech debt and drives up time needed for support.
Clients
Having a large number of clients creates much functional unplanned work if proper automation and tooling is not in place.
Stakeholder pressure
High pressure from stakeholders can push the team to build up tech debt. If pressure exists because the team has critical deliveries the expected business value should be used as an argument to get more funding.
Tenure
Identifying the most impactful non functional work requires deep understanding of the implementation. This is harder for a team with less tenure, so such a team is more likely to focus on feature development.
Team size
If a team running services in production is too small nearly all time will go into keeping things up and running. The size and scope of such teams should be increased to create a better balance.
How Teams Spend Their Time was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.