Technical Debt in ML systems — A summary

5 min readMar 10, 2022

The field of ML and AI is moving quickly and this paper was published a few years ago, but the topics discussed are still relevant and important. So, it is time to get this old draft of my article out.

Technical debt in software engineering is the incurred long term costs arising from moving quickly on implementation and deployment. This debt significantly slows down maintenance and improvement activities.

ML systems are part software engineering and inherit many of the same problems, like technical debt. While ML solutions are relatively easy to develop and deploy but monitoring, maintaining them is not.

ML systems have the added complexity of ML specific issues and constraints, in addition to those from software engineering. While software related issues are easier to spot and rectify, ML specific issues are at a system level and are difficult to detect — giving rise to ‘hidden technical debt’ in ML systems.

Sources of Technical debt in ML systems

Difficulty in enforcing abstraction boundaries

In Software engineering, making code maintainable is easier in comparison to ML. Software systems lend themselves well to abstraction boundaries.

ML systems in contrast do not — they use signals or features that are inherently entangled — changing the distribution of one feature may change the weights or the importance of other features. This is referred to as the CACE principle — Changing Anything Changes Everything.

Models are sometimes cascaded, where a model for a new problem is learned on top of an existing one. Although this approach may create a quick solution compared to creating a new model altogether — it results in system dependency. Analysis of improvements becomes expensive and improving any one model may lead to decrease in performance at the system-level.

There may be consumers that silently utilize the outputs of a model, referred to as visibility debt. Changes made to these models may likely impact consumers in ways that are unintended or poorly understood.

Data dependencies

Data features are often signals produced by other systems. Some of these signals may change behaviour over time, either due to improvements put in or because these signals themselves are the output of a ML system that updates itself over time.

ML models may also have data features that do not strongly contribute to the prediction intelligence and in some cases the features may be unnecessary. Having these features makes the system vulnerable to change. If there are legacy features, bundled features, incremental features or correlated features in the model, a re-examination is likely due.

Direct and Hidden feedback loops

A model may be feeding back into selection of its own training data. Any degradation in performance is also part of the feedback loop resulting in a vicious circle.

While a direct feedback loop described above is easier to investigate, hidden feedback loops — between systems that indirectly influence each other is tricky. Changes in one system may lead to undesirable effects in the other, often going unnoticed due to no knowledge of the dependencies.

High-debt design patterns

The prediction component of an ML system is only a fraction of the entire system — the rest is tooling or software code. Code that connects these two worlds together is referred to as Glue code. It is only supporting code, has no functionality, it however can be hidden and massive — making testing and developing alternatives difficult.

As new data sources are added to a data ecosystem, the number of ingestion pipelines also increase. Without an architecture that looks at data collection holistically, adding new sources can quickly become messy. Add a model to the mix, now you have a complex and interdependent system of scrapes, joins and sampling steps. Managing these pipelines, detecting errors, recovering from failures are difficult, costly and make further innovation costlier.

Experimental code paths that are no longer needed, contribute to the growth of debt. These codepaths make backward compatibility difficult to implement. Testing interactions between these codepaths is hard and can cause undesired effects in production.

Configuration of ML systems

ML systems have numerous parameters for data, features, algorithms — these can be configured until we get the desired performance. Configurations are sensitive and the messiness of data can make modifying these configurations difficult and prone to errors. Incorrect configurations can prove costly in terms of loss of time, computing resources and worse — production issues.

Changing external world

An ML system often interacts with the external world — an unstable world. Data or the mapping between inputs and outputs an ML system relies on, could change. This implies a need for constant monitoring and testing of the system which creates ongoing maintenance costs.

Measuring tech debt

Although useful as a concept, there isn’t an appropriate metric to measure technical debt over time. However, there are some questions we could ask that help in assessing the extent and nature of the debt.

In conclusion

Technical debt is an issue in both engineering and research. Solutions that offer tiny improvements at the cost of significant increase in system complexity or addition of 1–2 data sources without due diligence can lead to accumulation of debt.

ML tech debt is becoming increasingly important to address and the authors hope the paper encourages development in areas of maintainable ML. However, these improvements alone will not be sufficient. The authors note the need for a culture that supports recognizing, prioritizing and rewarding efforts that contribute to long term health of the ML systems.

And finally…My thoughts on the ideas presented in the paper

Not all sources of tech debt identified here actually contribute to it. E.g. Glue code is a great way to add abstraction and link layers together. Insufficient documentation or patchy designs is what allows Glue code to contribute to tech debt. Tech debt accumulates as a result of poor practices around these components or tools.

Costs in ML arising from changing external world (retraining, changing thresholds, testing etc) cannot be categorized as tech debt. These costs would present themselves irrespective of good practices — they are ‘out of control’ factors.

The paper however is a good overview of the factors that add to the overall ML cycle time and quality of the solutions.