MLOps, or Machine Learning Operations, is a framework or set of practices for collaboration and communication used by data scientists, AI specialists, and machine learning engineers to streamline and automate the machine learning life cycle. The ultimate goal of MLOps is to deploy machine learning models into production at a large scale. Like DevOps, the development cycles of MLOps are repeated until the desired quality or performance is achieved.
Following MLOps best practices solves many production hurdles faced by businesses, such as shortening development cycles (thus decreasing time to market), increasing requirements of reliability, performance, scalability, and security of ML systems, and increasing return on investment of ML projects. There are 6 key stages for successful MLOps, including data gathering and analysis, data preparation, model training, validation, serving and monitoring, and model re-training. For the purposes of this blog, however, we will focus on MLOps processes involved for model training, serving, and monitoring.
Model Training
In the model training stage, MLOps helps with reproducibility and automating model training and evaluation. Two concepts important for model training are experiment tracking and training pipelines. Experiment tracking refers to the “structured manner in which data scientists identify the factors that affect model performance, compare the results, and select the optimal version.” This is crucial in MLOps for evaluation specifically. There are many tools MLOps teams can use for tracking experiments. SDT provides an automated end-to-end model generation and optimization tool called CobiOps as part of the SDT Cloud ecosystem, for plug-and-play model training in everyday industrial applications. Other popular tools such as MLflow offers manual A/B testing of models and detailed experiment tracking for developers.
However, experiment tracking does not solve all the problems that arise with model training (e.g. manual errors). Training pipelines operate by enabling a sequence of data to be transformed and correlated together in a model that can be tested and evaluated to achieve an outcome; thus, mainly, it helps to automate machine learning workflows. Automating the processes of gathering and cleaning data helps lower the risk of human mistakes. Furthermore, MLOps teams benefit from ML pipelines to speed up development and operationalization. So, to summarize, experiment tracking can help with automating model evaluation, and training pipelines help with automating model training.
Deployment
The final step in putting machine learning model training to use is deploying a developed model to a production environment. Model deployment is arguably the most difficult part of MLOps. The first model deployment decision businesses will likely make is coming up with a “model serving strategy,” in other words businesses must consider how the model will make predictions.
Types of Model Serving
Model serving refers to how to serve the models that have been trained using machine learning. One of its challenges is that businesses implementing machine learning likely employ two different groups that handle the differing responsibilities of model training and model serving. This also means they use different tools and have different concerns. Nevertheless, as these teams converge within the machine learning engineers, it is important to be familiar with various types of deployment strategies needed for MLOps. The serving methods that MLOps developers should be most familiar with are offline serving, online model as a service, and edge deployment.
Offline serving, or batch inference, is a relatively simple way to generate predictions and run models in timed intervals. In this method, serving is processed in “batches.” This model also helps business applications to store predictions. These predictions are then stored in a database and can be made available to developers, which allow data scientists and machine learning engineers to take advantage of scalable computing resources to generate many predictions at once.
The biggest disadvantage of offline serving occurs when handling new data. Predictions generated in batches are not available for real time purposes, which means that predictions may also not be available for new data. This is because when new data is added, a new version of the system must be re-trained from scratch, using the whole data, not just the new data.
Batch inference could be best utilized in ML models for product recommendations on ecommerce sites. Rather than having the ML model generate new predictions each time a user logs on to the ecommerce store, for example, data scientists may decide to generate recommendations for users in batch and then cache these recommendations for easy retrieval when needed.
Online serving, or online inference, automatically learns from user inputs and generates machine learning predictions in real time. Rather than wait hours for predictions to be generated in batch, the ML model can generate predictions as soon as they are needed. Online inference also allows data scientists and machine learning engineers to make predictions for any new data. A simple example of using this deployment model is with consumer F&B service applications that require predictions in real time.
There are some applications where online serving of models is critical. For instance, when a user orders food through a food delivery service, the user can see the expected food delivery status with very good accuracy accounting for traffic and weather conditions. More advanced examples in which online serving makes sense is in augmented reality, virtual reality, human-computer interfaces, self-driving cars, and any consumer facing apps that allow users to query models in real time. There are technical challenges that data scientists must consider when deploying their models. For one, online algorithms are susceptible to catastrophic forgetting, or catastrophic interference, which can lead to long-term performance regression. In all, while offline serving processes data in batches regularly, online serving processes data immediately.
The biggest disadvantage to online serving is the costs of data communication and cloud computing resources. “Real time” online serving generally refers to data being communicated with a frequency of under a minute (or sometimes up to 5min depending on complexity or necessity), which increases the data transmission rate exponentially; costs also depend on whether you are using a cellular LTE or LoRa network. Cloud resources must run continuously. With cloud services such as AWS generally invoiced by resource amount X minute of usage, this type of deployment is difficult to scale cost effectively. Alternatively, offline serving may transmit once a day, and only require a large amount of cloud resources for a few hours a day.
Edge development moves away from using a public server, such as those used in offline and online serving, and moves the model processing/computation off the server and onto the edge. Real-world edge deployment for ML models is currently very limited due to a history of low hardware availability. However, there are strides being made towards mass edge deployment, and edge computing and edge AI is considered one of the hottest IoT applications by Gartner in 2022.
SDT provides a complete HaaS (Hardware-as-a-Service) platform for MLOps by locally installing and networking the SDT ECN and NodeQ—edge processing computers and data collection devices, which can connect to most industrial and legacy machines. Combined with SDT CobiOps machine learning software, every site can be transformed into edge development MLOps for major cost and operations savings. Other companies provide support for MLOps edge platforms by offering software-only packages, such as Google’s TensorFlow JS for model inference SDKs.
Model monitoring
Monitoring helps data scientists detect changes in model behavior. Once a model is deployed in production, monitoring is required to make sure the model continues to work properly. Consider the typical DevOps metrics that need to be reviewed and resolved should it exceed a certain threshold: CPU unitization, memory, network usage, and others. Model monitoring helps MLOps teams in identifying potential issues beforehand and mitigate downtime.
Model monitoring does not end there. MLOps teams also need to make sure that the predictions of the model do not grow stale. If a model’s performance does deteriorate, however, the team can trigger a training pipeline and retrain the model on the new data. Recent years have seen the rise of many ML model monitoring platforms including the likes of Amazon SageMaker, Qaludo, and Neptune. SDT is also expected to release a DevOps monitoring and infrastructure management tool by the end of 2022, which can work in conjunction with CobiOps to provide model feedback.
The importance of the model monitoring framework is that it sets up an all-important feedback loop which dictates MLOps teams either to update or continue with the existing models.
Conclusion
Most organizations struggle at implementing, managing and deploying machine learning models at scale, giving rise to MLOps which has embraced all steps of ML software since its inception. While this blog only scratched the surface of MLOps, as a growing discipline, it is important to understand how teams can converge to solve problems with ML models. MLOps.org offers more resources over at GitHub to continue learning about this emerging field. Follow the SDT blog for future blogs on the topic!
Read more about SDT machine learning developments on the SDT Naver blog or follow our SDT LinkedIn to stay informed on our upcoming product releases.
About the Author: Karen is a passionate B2B technology blogger. While studying at Georgia Tech, Karen first grew interested in cybersecurity and has since worked for several security and cloud companies as a global marketer. When she’s not freelance writing, Karen loves to explore new food trends.