Machine Learning

An Introduction to Machine Learning Model Deployment

Aug 17, 2022

5 min read

In my previous articles, we discussed the entire workflow in a machine learning project and how to train your models using machine learning algorithms. This article will review how to deploy our trained models into production.

The happiness in any ML project is to see the value it creates. Whether a personal project or a team of data scientists working towards solving a particular problem, we all hope to deploy our models into production one day.

There are many ways you can deploy your model, as a web service, as an API, as Raspberry PI, or on a mobile device. This article will cover deployment methods, tricks, tools, best practices, and some of the MLOps architecture.

What is model deployment in machine learning?

Imagine you and your team have worked on a model for months that predict fraudulent products in your e-commerce store, which gives near-perfect results. But, of course, your job does not end there. Ideally, you would want your model to predict fraud in real-time so that your e-commerce store is void of fraudulent stores and products.

Taking an ML model into a live environment (an existing production), where the model predicts outcomes based on real-world data, is known as model deployment.

Most organizations, irrespective of their size, will have an existing DevOps process with which you need to integrate your MLOps process. MLOps vs. DevOps is a topic that Aryan has beautifully explained in this article.

If you browse online resources, you will find many articles about data collection, preparation, training, and maintenance. But, very few go over the deployment in detail because deployment is a complicated process.

Steps to take before deploying an ML model

DevOps have evolved over the years, as described by Patrick Debois. For example, take a look at the image below.

MLOps is following the same path. So why am I talking about MLOps in model deployment? Similar to DevOps (whose practices have been honed over the years), MLOps is the process of serving your model to the world.

As our R&D head Yash once said while explaining the role of MLOps to me, MLOps is at the intersection of DevOps (a linear action) and the evolution of software deployment. So, while DevOps work is accomplished with the push, MLOps continues to work even after the model is deployed with monitoring and retraining.

What we do from here is also known as ML architecture.

Here are some tips before you go about deploying your models. First, because we might load our models with features that would give us 95%+ accurate results, taking the feature-heavy model into production might crash your system.

Portability: The ability to transfer the code from one machine to another. A portable model will help you decrease the load time and make rewriting less stressful.

Scalability: A model is useless when it does not scale. It must adapt to real-world situations, take in business data, predict outcomes, and scale as the problem scales.

Reproducibility: The model should be able to replicate any component of any scenario that might occur in the future.

Testing: It is the ability to test different versions of the model post-deployment.

Automation: It is the ability to automate steps in the MLOps pipeline.


Docker revolutionized how software is deployed, and the benefits of containers are equally crucial in MLOps. They provide considerable advantages in generating accurate environments in no time. Combine containers with CI/CD workflows, and you do not have to worry about scaling. Make sure every stage of the machine learning workflow is containerized.


Ask any DevOps person, and they would imply the importance of Continuous Integration and Continuous Development. When you put every aspect of your ML workflow into automated testing, your deployments become smoother and more accurate.


There are 100s of ways to test your model before going to production. However, not all of them would suit our ML pipeline. We will go over some of them. Refer to this https://sre.google/sre-book/testing-reliability/ for details about testing.

Differential Tests: This is where you compare the difference in performance/predictions between the current model and the previous. It is useful where a model might seem healthy, but it has been trained on outdated datasets or features.

Benchmark Tests: This allows you to benchmark the performance of the current version of the model with the previous version to predict performance metrics and enables you to stop adding new features to the model.

Load/Stress Test: Used mainly on CPUs and GPUs to determine their performance in large ML projects.

Deploy machine learning models into production

Here’s a high-level overview of ML architecture.

Data component: Which serves access to data sources that a model needs.

Feature component: Generates feature data preferably in a usable and scalable way.

Serving component: It is responsible for serving models and scoring predictions.

Observation and Evaluation: Evaluating the model, monitoring post-deployment, and comparing with training data.

Different ways to deploy:

With the ML architecture in place, and the model production ready, there are several ways in which you can deploy your ML Model.

Batch Deployment with REST API: Batch training constantly allows you to have an up-to-date model version. It is also a scalable method eradicating the need to use the entire model set. Here, training is done offline, while prediction is made online.

Batch Deployment with shared database: Similar to batch deployment with REST API, training is done offline, but predictions are made in a queue, almost like a real-time prediction.

Batch Deployment on mobile: Similar to batch with REST, the only difference is that the deployment is made on a customer device.

Stream Deployment: Training and deployment are performed on completely different yet connected streams.

One-off Model Deployment: Very rare in ML, we might argue that some models do not need to be deployed continuously, and some are better with one-off or periodical deployments. In such cases, ad-hoc training is the only option, and retraining can wait until the model performance decreases.

Let’s look at one example of batch deployment with REST API.

The image shows that the entire process is split into two halves, dev, and production. The training, feature extraction, and model building happen offline in dev, then deployed into production.

Training Data: Exactly what it sounds like. It is where your model fetches data and uses them to predict the outcome. This process can be simple or complex based on your database, datasets, where you store them, and how you prepare them.

Feature Extractor: This is where the essential features for the needed model are selected or generated. Some of the popular libraries for feature extraction are Tensorflow and SKLearn.

Model Builder: Where models are versioned, formatted, and prepared for model deployment.

Production (Trained Model): Where output can then be deployed via a REST API.

Post Deployment:

Machine learning deployment is more than just pushing the models into production. Ongoing monitoring is needed to make sure the model is performing efficiently. Putting a solid monitoring process into place is a tiring one. But, it is most needed to avoid data drift or outliers.

Once you have a solid model monitoring setup, it is far easier to detect data drift and performance degradation and can be solved with a sense of urgency. It also aids you in retraining your models with new data sets to avoid drifting.

Time to deploy an ML model:

The honest answer is that it depends on the scale of the problem, the size of your model, and the setup of your ML architecture. According to research done by Algorithmia, "2020 State of Enterprise ML". The average days for a model to be deployed is between 30 to 90 days.

Challenges in Deployment:

Any ML project requires intensive planning from all parts of the organization, the complexity involved in deploying a model outweighs any bureaucracy involved. Therefore, data scientists, ML engineers, DevOps, and developers must work together to close the knowledge gap.

Some of the main challenges when it comes to deployment are:

  • Communication indifference between the data science team and the deployment team.

  • Choosing the proper infrastructure for the machine learning project.

  • Monitoring and retraining are easier said than done in real-world situations.

  • Scaling models as the business problem scales.

  • Convincing the stakeholders that the model performs with the predictions.

Best Practices in machine learning deployment

As you plan, evaluate, train, and deploy your model, there are certain best practices you need to keep in mind regarding ML architecture.

  • Always keep the ROI in mind. For you, all that matters is development and deployment, but the business is more about time, resources, and ROI. Remember that when you go bonkers on a model, a person is going crazy about the cloud bills that come with it. And if the prediction does not justify the ROI, there are some serious discussions to be had.

  • Research, research, and research. The most crucial aspect of any ML project, especially in architecture, is choosing a state-of-the-art platform. Research is your best bud.

  • I cannot stress enough the importance of model-ready data. I have been talking about it in my previous articles as well. Therefore, High-quality, clean, and processed data is almost half the work.

  • Another critical aspect of data is preprocessing it. Every part of cleaning the data to suit the model and splitting the datasets into training, validation, and test dataset helps you in your ML journey.

  • Keep track of experiments that you add to the models. It's in our nature to get carried away with solutions so much that we forget the number of features we loaded in our models. Experiment tracking helps you get your model production ready and lets you know the performance of those experiments.


I have taken you through what needs to be done regarding deployment. Keep in mind that deployment is never a one-time process. It is an iterative process that should evolve as your business problem evolves.

Now, if you want to learn the previous steps involved in a machine learning workflow, you can do so here, and if you're going to learn specifically about model training in machine learning, you can do so here.

Written By

Thinesh Sridhar

Technical Content Writer

Copyright © 2023 NimbleBox, Inc.