Defining Technical Debt

Oliver Jack Dean May 19

I've been reflecting on the nature of technical debt, a subject that often seems to remain in the background for many key decision-makers, yet it strikes me as profoundly important for a company's operational health.

The term "technical debt" itself appears to be somewhat multi-dimensional, which perhaps contributes to the challenge of addressing it. There are, of course, various related concepts like UI/UX debt, culture debt, or process debt, people debt - which I won't delve into here, but they are worth acknowledging as part of a broader ecosystem of "technical debt".

Towards a definition:

It seems many people have their own understanding of what technical debt truly is. One definition I've found useful over time is to think of technical debt as the implied cost of future rework that we accept when choosing an easier, quicker solution over a better, though more time-consuming, approach in the present.

It's somewhat like taking on a loan; there's an immediate benefit, but a deferred cost, with "interest" accumulating in the form of future complexities or rework.

I've observed that pinning down a precise definition can be elusive though. My understanding is still not truly accurate.

Sometimes, "product debt" and "technical debt" are used almost as synonyms.

It's also common for technical leads or product managers to develop an intuitive sense for it - a kind of "technical debt spidey sense" - and when they see it, they flag it immediately but translating this intuition into a clear message for their teams often proves to be a challenge as well.

Another way to look at it, I suppose, is as a trade-off between immediate velocity and long-term maintainability. This often brings to mind the 80/20 principle - the idea of achieving a significant portion of the outcome with a smaller portion of the total effort. It's a continuous balancing act, it seems, between delivering new features, addressing bugs, and then allocating specific effort to refactor earlier decisions to enable greater speed later on.

What's interesting about the 80/20 rule in this context is its inherent imprecision. I've noticed that the more effectively engineering-driven initiatives are articulated, the more they tend to be recognized as valuable product initiatives in their own right.

A further observation is that technical debt often seems to cast a longer shadow than other forms of operational debt. Consider customer debt, or CX debt - the issues that directly impact customer satisfaction. These require ongoing attention. Addressing this might mean prioritizing critical bug fixes to ease the burden on customer support or to reduce churn, rather than embarking on a large-scale codebase refactor.

However, these types of technical debt often get overshadowed by engineering-driven intiatives - specially in technical companies. As expected this leads to what I see as a fundamental challenge in almost every organization: how to conduct a productive discussion about the allocation of capital and resources when faced with these competing priorities?

Technical debt doesn't always stem from a consciously "easy" choice made in the present. Sometimes, it's an accumulation of decisions from the past - decisions that may have been entirely appropriate at the time, or at least appeared so based on the information then available, but which now act as a drag on future development.

Other times, it's simply the result of technological evolution, like the necessity of upgrading a shared library in the code repo. This isn't inherently a good or bad past choice, but rather a consequence of the ongoing maintenance inherent in using shared components.

Common approaches to managing Technical Debt:

Often, companies tend to adopt one of a few general approaches to managing technical debt:

Reactive Management: ad-hoc approaches, where technical debt is addressed primarily when issues become too pressing or disruptive to ignore

Integration into Delivery Schedules: more structured methods to reserving a certain capacity within each sprint or development cycle specifically for working on technical debt

Dedicated Resources: Some organizations establish dedicated teams, perhaps an infrastructure team or even a specific "technical debt team," or allocate specific budgets to tackle larger, more complex debt projects.

In larger companies with more available resources, I've encountered internal "Customer Product Engineering" (CPE) teams which are small, focused groups of stakeholders tackling technical debt issues.

Signals that might indicate a Technical Debt problem:

Identifying when technical debt is becoming problematic can be subtle. I've noticed a few signals, though I'd consider them more as symptoms that might point to technical debt, among other potential causes.

One indicator that I believe warrants more attention is the onboarding time for new engineers. If it takes a new developer an extended period of time to become proficient and understand the system or code repo, this could suggest a significant amount of underlying technical debt. They might be struggling to grasp the reasoning behind past architectural choices or the existence of multiple, inconsistent patterns for similar tasks. This often goes beyond just missing documentation; it points to the implicit knowledge and accumulated workarounds that existing team members have internalized.

It's also useful, I think, to distinguish between intentional and unintentional technical debt. Intentional debt arises from a conscious decision - perhaps due to time constraints - with an awareness that it will need to be addressed later. Unintentional debt, however, is often more insidious because it accumulates unnoticed, only revealing itself when it causes significant problems.

In my experience, if the ramp-up time for a new engineer to become competent - able to independently manage a request extends much beyond a month, it's worth asking why. While some systems are genuinely complex, a period of 2-3 months often suggests that a considerable amount of undocumented technical debt requires extensive explanation by existing team members.

And this isn't solely an engineering issue. I've seen similar challenges for UI/UX designers trying to navigate a cumbersome internal design systems, or for new product owners and customer support personnel struggling to understand technical concepts outlined in internal knowledge bases, all potentially exacerbated by underlying technical debt.

Tracking and logging Technical Debt:

Returning to the 80/20 idea, I've found it can be a useful framework for discussions with product managers. There's always pressure for new features, but if a case can be made for how addressing technical debt will yield long-term business benefits, it helps build the necessary trust for allocating resources. The precise ratio might vary - sometimes 90/10, other times 95/5 - but establishing that understanding seems to be a key starting point.

It also seems useful to classify technical debt by its nature. Is it primarily a coding issue, an architectural concern, a UX design problem, or even process-related? Where is the debt coming from precisely? Each of these debt types may be tracked and managed differently. While various tools (Jira, ClickUp, Asana, etc.) are used, engineering-focused debt (coding or architecture) is often managed within the team's primary planning tool, with clear categorization.

Process debt related issues for example, often require more cross-team liaison and discussions. Some teams even go a step further and distinguish in their tracking whether the debt was intentional or unintentional.

I've also heard of product managers maintaining a simple document, like a Word file, where anyone can log issues. These are then reviewed periodically, prioritized, and moved into a more formal system. A common challenge with this, it appears, is that much of this work doesn't directly align with immediate product team deliverables, making it difficult to convey its importance and potential impact to those teams.

The product roadmap:

This feels like one of the most critical aspects. It seems to come down to "speaking their language" - e.g., the product team's langauge. If engineers, or those in networking, platform, or UI/UX elsewhere, need product teams to prioritize technical debt, they must be able to articulate the need in terms that resonate with how product teams think.

Perhaps it's even more effective to speak in business terms.

If one can quantify the impact of addressing technical debt in a way the business understands - beyond just saying "we'll ship things quicker" - the argument becomes much stronger.

What does "quicker" mean in tangible terms? Can it be translated into cost savings, or a specific number of hours gained per sprint?

When the facts are presented in this way, it becomes a more compelling case. If a decision is then made not to proceed, at least it's an informed decision.

Data happens to be very important here:

What is the current volume of bugs and their measurable impact? What is the engineering cost associated with managing these ongoing issues?

How is time-to-market being affected? This is usually a key concern for product teams. If this debt is addressed, what additional features or value could be delivered with the freed-up capacity?

I recall a situation where a engineering team was spending nearly 70% of its time on escalations and handling urgent issues, leaving only 30% for new feature development. By explaining how refactoring could shift that balance, aiming to reverse the percentages over, say, 3-6 months - it was possible to gain alignment.

A difficulty, however, can be the lack of sophisticated forecasting tools within engineering teams to accurately project these benefits. In many growth-stage companies, architectural choices, such as migrating from a monolith to microservices, are made. While engineering may understand the long-term advantages of this migration, the business often prioritizes immediate feature delivery.

If engineering estimates for technical debt reduction prove inaccurate, it can lead to friction. It's also important to acknowledge, I think, that technical debt is rarely, if ever, 100% eliminated. The more realistic goal is to manage it to an acceptable level, and what's "acceptable" can vary significantly depending on the product's industry vertical market (e.g., finance has different risk tolerances than e-commerce or healthcare).

Roadmapping and prioritization hacks:

One surprisingly effective, albeit unsophisticated, tool I've seen engineers use is a simple roadmap they design themselves and share with product.

If there's a desire to undertake a significant refactor, introduce a new database, or adopt a new framework, creating a brief roadmap in a document or slideshow can be very clarifying. It should explain what will happen once the change is implemented - somewhat akin to Amazon's PR/FAQ approach where the benefits are articulated upfront.

It might outline phases - a few weeks for task A, then a few weeks for task B, and so on, culminating in a specific improved capability. This can then be juxtaposed with the existing product roadmap to analyze the potential futures - with and without the initiative.

I like the simplicity because it frequently surfaces other pros vs cons that the person championing the initiative might not have initially considered. While I'm not personally a strong advocate for highly specific developer productivity metrics, a straightforward document has often proven remarkably effective in my experience.

Other approaches I've seen involve using OKRs. When these are transparent, all teams or LOB can ask themselves: how will this technical debt initiative improve a specific OKR? Will it enhance usability or address a customer-impacting area targeted by an OKR?

Such a self-relfective approach via OKRs helps teams define the scope and impact, perhaps using T-shirt sizing for a high-level estimate. Such information can then inform discussions about resource allocation, like the 80/20 or 60/40 splits, from one release to the next.

Some very large tech companies, like Netflix or Google, are said to occasionally perform complete rewrites of codebases every 2-3 years, which is a significant undertaking only feasible with substantial resources.

More commonly, I've observed top-down approaches which can work for smaller companies. Sometimes this means leaving it to individual teams or LOB to identify and tackle their most pressing debt and be responsible for "Proxy Metrics".

In other organizations, they take this further by dedicating a specific 2-week sprint each quarter to technical debt. When intentional debt was being introduced, engineers would sometimes even comment on it or leave "technical debt" flags in the code repo itself, earmarking it for a future quarterly tech debt cleanup.

Another top-down strategy is to allocate a fixed percentage of budget, say 20%, to technical debt. This can work, but I've also seen situations where teams then try to fit items into this allocation that aren't truly technical debt, simply to utilize the budgeted time. This can result in cycles being spent on, for example, retiring old functionality that isn't causing active problems, rather than addressing more pressing underlying issues.

Governance is key:

To make the management of technical debt a sustainable practice, it is crucially important for technical debt to become ingrained in the overall operating and engineering culture.

Engineering guilds have appeared to be successful in this regard - or hybrid Change Advisory boards (CAB) of some flavour. Such setups provide a safe space where people can present the status of various initiatives or discuss the "why" on a broader level with interested engineers or related peers.

I've also heard of more informal meeting cadence initiatives, like "Refactor Fridays", where individuals can contribute to tackling a large, ongoing problem and do standups.

Ultimately, navigating technical debt seems to be an ongoing process of observation, discussion, and adaptation, rather than a problem that can be definitively "solved" at the root cause level. It requires a sustained commitment from many parts of an organization and for teams to recognise different strategic approaches available.