It’s an increasingly rare application these days that doesn’t claim to incorporate some form of artificial intelligence or machine learning capability.
But while this may be great from a marketing standpoint it does pose a challenge for developers. We spoke to Luis Ceze, CEO and co-founder of OctoML, to find out more.
BN: We know developing AI/ML has nuances that differ from traditional software development? What are some of the challenges developers run into?
LC: AI is the new ‘brains’ of the app — much like databases before it. But unlike databases (and virtually all other types of software) AI is largely inaccessible to application developers and operations teams. Why? Because the process of actually getting the machine learning models that power AI into these smart applications is unnecessarily complex and cumbersome. Not only that, but models are dependent on specific hardware and software infrastructure to perform well enough to be viable.
What we have today is a fundamental disconnect between the process of creating these models, and actually getting them to production. As a case in point, the average time-to-production for an ML model is 12 weeks, and 47 percent of models are dead on arrival — which is unsustainable if AI is going to make a dent on the world.
The crux of the issue is that ML models are treated as this bespoke thing that requires specialized workflows and tools — when in the end, after being created, they are just code and data, and should be treated as ‘regular’ software modules.
The deployment requirements of ML are reminiscent of the ‘matrix from hell’ that Docker helped exorcise from software development. What’s even worse this time around is the ML software stack is fully-dependent on the hardware where it runs. Then you must hand-tune models to meet performance SLAs for the application. We’re now living in the era of the ‘tensor from hell’.
Let me break down just how painful the process is today. Almost every model created ends up with its own unique pipeline. These are generally hand-crafted, with very little variability, and even the best data scientists and AI practitioners often get it wrong. They’re very fragile. One change in the environment, training framework, software library, or integration stack, and the whole thing needs to be debugged, or worse, completely rebuilt.
This is the major reason why the hand-off from the data scientists to the app developers and ops teams is so messy — it’s plagued with trial-and-error. In order to break this complex cycle and accelerate AI application development, machine learning must align with DevOps workflows and best practices.
BN: There are some terms like MLOps which are still not widely understood. Can you tell us what MLOps means and if it’s helpful for developers?
LC: MLOps started as a set of best practices to deploy and maintain machine learning models, but it is slowly evolving into an independent approach to ML lifecycle management — from integrating with model generation, orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To be clear, model creation and DevOps are a thing, but it’s the co-mingling of the two that tries to be all things to all people. MLOps claims to encompass everything that happens from model creation to deployment and every single step in between. And that’s simply not feasible because each of those processes requires a substantial amount of special resources and expertise — which as we all know is hard to come by these days.
The bottom line is we shouldn’t draw a difference between a model and other kinds of software. They are all components of one smart application. It just seems like we’re complicating our lives with no good reason when DevOps has already paved the way. Rant over!
BN: As you mentioned earlier, AI/ML application building suffers from bottlenecks, which can end up hindering innovation in the long-run. What can developers do now to prevent this?
LC: Not to sound like a broken record, but first: come to terms with the fact that models are code. By accepting this, you unlock the potential for app developers and DevOps teams to become an integral part of ML deployment.
Second, practitioners need an easier way to work with models-as-software. Abstract out the complexity, strip out dependencies, and automatically generate and sustain trained models as agile, portable, and reliable software.
BN: Your argument about needing to treat models as software makes sense. Look ahead five years from now and pretend this shift has taken place. How do you see the AI application landscape?
LC: Looking into the future is always fun. I envision a much richer ecosystem of what I call ‘AI-Native’ companies. These are companies that can’t exist without AI — as opposed to companies who might apply AI here and there to power a single function.
I think we’ll also see a lot more Low code/No code solutions for building AI. This is a world where software engineers will be able to deploy models without needing to become ‘experts’ in machine learning. As I am sure you know, software development is extremely complex and requires deep knowledge of programming languages, libraries, and APIs. Low-code/No-code solutions offer a paradigm that abstracts these layers of complexity, enabling developers at all levels of proficiency to build AI applications. In short, AI will be much more accessible.
Photo Credit: NicoElNino/Shutterstock