The problem with AI model failure and how to avoid it [Q&A]

Artificial intelligence

It’s tempting to look at the hype surrounding AI and see it as a solution to all problems. But AI isn’t perfect, there have been some notable failures, often due to poorly defined models.

What are the consequences of getting it wrong, and how can businesses ensure their AI projects stay on track? We spoke to Alessya Visnjic, CEO of AI observability specialist WhyLabs to find out.

BN: What are the major challenges developers face when developing AI models and applications?

AV: For many years, the main challenge had been around deploying AI models to production. The AI developer community struggled with reproducibility of AI experiments in production environments, and with scaling models originally built in Jupyter notebooks to run in high throughput environments. Last year, however, we observed a major phase shift in the needs of AI developers. Thanks to the AI deployment infrastructure built at AWS, Databricks, GCP, Azure, and many startup vendors, most AI developers were able to successfully deploy models to production.

But today, post-deployment maintenance is a bigger challenge than model deployment itself. And organizations are sharply focused on creating culture, process, and toolchains for operating the models they were able to deploy to production.

Here’s why. The AI development process is surprisingly similar to familiar software development approaches. The problem with this is that we treat software as code, and do not account for how data — which is the foundation of a ML model — affects the behavior of ML-powered software. Data is complex, highly dimensional, and its behavior is unpredictable. And the metrics we use to measure the performance of software don’t adequately measure the performance of ML models unless they account for data quality.

The big issue with AI models isn’t that the software has bugs, it’s that the results they deliver will be wrong if the data they’re getting is drastically different from training data or isn’t healthy. AI models drift, become inaccurate, when the data coming in behaves differently or is not robust. In other words, models fail when they’re not ‘ready’ for the data they receive — and developers need the right tools to detect, observe, and understand why this is happening.

BN: What are the perils of model failure in AI? What’s at stake here?

AV: As I mentioned, the majority of model failures typically originate in the data that models consume, not just from software defects. In business, AI failures can cost companies millions; in healthcare and the sciences, bad AI can pose a serious consumer risk.

What’s at stake depends on the application. We could start with human life. A classic example was IBM’s Watson for Oncology, which was supposed to give the best cancer treatment recommendations to doctors. It turned out this $62 million system was trained on synthetic patients, not real patients’ data, and it gave bad — potentially fatal — treatment recommendations. With proper monitoring and observability tools, such issues with models will be uncovered very early and ensure that unreliable models don’t make it to production.

In business, the consequences of poor AI execution usually include revenue losses and customer dissatisfaction — for example, when chatbots give useless answers and frustrate shoppers to the point they leave without making a purchase.

AI is finding its way into systems that manage energy systems and power grids that are highly complex and involve the use of very large volumes of data. The UK has experimented with Google DeepMind to manage parts of its power grid, for example — and a failure of AI in energy infrastructure could result in a very expensive mess that harms thousands of people.

BN: MLOps seems to be growing as a trend. What does that mean exactly and how is it helping advance AI?

AV: I’d describe MLOps as a reliable set of techniques, practices, tools and culture that, working together, help AI models keep performing as they should. MLOps helps solve a real-world problem: that AI model data is ever changing. MLOps would include whatever processes you set up to bring in the necessary new data to update the AI model.

Why does the data change? Because life changes, really. To use another voice recognition example, when COVID arrived VR technology suddenly became less effective at understanding people in public settings. Why? People were often speaking through masks, and the models were not trained to understand speech-through-mask. The data had drifted, big time, and it challenged AI/ML-based voice recognition systems to adjust.

A robust MLOps framework includes taking steps to have ongoing observability to preempt model failure. Model observability is fundamental to keeping AI applications at peak performance.

BN: What are some best practices that ML practitioners can put in place to ensure model robustness and overall AI responsibility?

AV: One starting point is to ensure that data health is a major, fundamental aspect of the original plan for any new AI application.

That is a paradigm shift for most software developers, and it’s very important to begin establishing MLOps for the organization that will maintain the model. In other words, the development process has to include ways to deal with change in the data stream and with change of customer behavior.

We already talked about observability, which should be included in every production ML stack.

Finally, as a best practice, practitioners can prepare for the ongoing care and nurturing of their AI application. We have long perceived developing software as one-and-done; you create it, test it, and deliver it. Apart from occasional bug fixes and feature updates, that’s ‘done’. But AI models are dynamic. If they’re used in the real world, they must be continually adjusted to new and evolving data. They actually are somewhat like physical machines, such as vehicles, that are continuously wearing out and becoming less efficient — they need repair, if you will. If your car gets out of alignment it pulls to one side, ML applications do that too.

However, it’s encouraging that the toolset needed to help AI software developers build applications and data infrastructure that sustain accuracy are coming to market, and the barriers to entry are dropping.

Photo Credit: NicoElNino/Shutterstock

Author: Martha Meyer