A crumbly snowball made entirely of Dockerfiles, Python, and YAML (and a tiny shard of AI)
To most people, Artificial Intelligence Systems look like they exist very much in isolation, and appear quite compact. You open your web browser and navigate to https://chat.openai.com/, for example, and there’s ChatGPT in your little rectangular browser. Ah, he must live here, you think.
Perhaps you say “Hey Alexa!” to your Amazon Spot, and when the response is emitted, you assume Woah, there’s a special AI circuit board in this little plastic ball! All of us interact with AI through these very pretty and simple user interfaces, so we naturally assume that AI itself is also somehow modular, small & simple, like this Google Nest:
Having worked on the development of a number of AI systems, I can affirmatively tell you that they are nothing of the sort. The number of lines of code that go into making even the simplest system is in the thousands.
But you know what’s really crazy? The amount of code that is actually producing the output that you and I consume (responses from Alexa, sentences from ChatGPT) is really just a tiny part of the whole system. To quote from a Google tutorial on machine learning systems:
[AI-related code] is at the heart of a real-world…system, but that [code] often represents only 5% or less of the overall code…
Having taken part in the building of an end-to-end AI system at a company I worked for in the past, I wanted to see if this was in fact the case. Below, I will
- describe the old system we had used for making AI predictions, and the problem with it, then
- outline the structure and specifics of the system we created to replace it (a true MLOps end-to-end system). And finally, I’ll
- break down the code base (by number of lines) into a few categories, to see if this “AI-related code accounts for 5% of less of the overall code” statement is actually true or not.
Old System: Making an Initial Model
So, we had a database full of data on barley production, and were set to go! We knew when the barley was planted, the geographic boundaries of each field, the soil profile, et cetera. Most importantly, we had actual ground-truth yield data associated with every field, as shown below. We knew that we could make some sort of yield prediction model, using this data.
{
field: 1,
actual_yield: '3.457 tons/m2',
field_boundaries: [
[99.23432546, 110.23423003],
...
],
planting_date: "2018-10-12",
nitrogen: 0.33,
phosphorous: 1.23,
soil_texture: "loamy"
},
{
field: 2,
etc...
}
The people in our research department soon found that a sci-kit learn MLP Regressor model worked the best. During testing, and even during cross-validation, the model reliably predicted yield pretty accurately.
This was our code, then, which we saved to a pickle file (*.pkl) after it was fit to the data. We then used that pickle file to start making predictions on new data, and that’s when the problem became apparent.
from sklearn.neural_network import MLPRegressor
X = []
for feature in best_input_features:
X.append([v for val in feature])
y = [y for y in actual_yield_figures]
model = MLPRegressor().fit(X, y)
The Problem: Models Degrade Over Time
We went forward with our One Infallible Pickle File, making predictions for next year’s barley crop, the year after, and on and on. However, it soon became apparent that the model started (about a year after release) predicting worse than random, because it was trained on limited historical data, and started doing more and more poorly on new data.
The Solution: A Completely Transparent, Train-And-Retrain AI Pipeline
Rather than a train-once-and-forget system, what we needed was a system to train models (and save those models as new pickle files) on a continuous basis, so they would never fall out-of-sync with reality like our barley yield model and its One Infallible Pickle File did. We not only needed models to be re-trained on a continuous basis, but we also needed some way to see the most up-to-date stats on model accuracy, so we could know, in real-time, how well our models were predicting real-world input data.
Here’s what we deployed (well, the shiny front end, anyway). It allowed us to see (by model version) a scatter plot of predicted vs. actual values, as well as an error histogram and details of each model. Multiple versions could be displayed simultaneously to compare accuracy, as well.
The system “woke up” every month, gathered all the data needed for training, trained a new version of every model, and saved a new pickle file in the cloud. The accuracy of each version of a given model could be compared via the dashboard above (hosted on Google App Engine).
A Five-Percent Heart? Let’s Break Down the Code!
Taking a look at the number of lines of code in the system, it becomes apparent that the size of the code base varied quite a bit in relation to what function the code performed. Here’s a diagram of what functions the system had, and how many lines of code were dedicated to each function (represented by area):
In our 9,193 lines of code base, only 8 lines were used for actually making predictions using AI (that’s the fuchsia pink block in the bottom right). This means that our AI-related code represented only 0.087 percent of the overall code.
Summary: A Fragment in the Slush
As mentioned in the opening, AI systems can seem like small & contained things, where whatever is generating the output is at center stage, but in fact, it’s not that way at all. AI systems are more like a slushy & half-melted snowball, cobbled together from a number of different programming languages & dependent on (sometimes multiple) cloud computing service providers. Take a look at the back end of these systems, and I’m sure you’d be amazed that they function at all.
And the kicker is that the vast majority of this snowball is simply devoted to gathering training data for the system and making sure the system stays up-to-date and accurate. Whatever part is actually generating the output that you and I consume is but a mere fragment embedded in that snowball. As the Google tutorial mentioned above stated (and as my code breakdown proved) “[actual AI-related code] often represents only 5% or less of the overall code [in a system]” (in my case it was actually 0.087%).
The takeaway here is that, even if you’re not a Data Scientist, you still have an immense and important role to play in this AI-crazy modern world. The Data Scientists on your team, after much research & fine-tuning, may present you with a tiny piece of gold. Now, as a non-Data-Scientist, it’s gonna be your job to wrap that thing in a bunch of dirty, crumbly snow, throw it high overhead, and just pray it gets lodged securely in the cloud.
AI Systems: Bloated Backends and Tiny Hearts was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.