We are pleased to return Transform 2022 in person on July 19 and, in fact, on July 20-28. Join AI and data leaders for in-depth conversations and exciting networking opportunities. Register today!
Artificial intelligence (AI) Adoption continues to increase. According to McKinsey queryMore than 50% of companies in 2020 will use artificial intelligence in at least one function. PwC query found He said the pandemic has accelerated the adoption of AI and that 86% of companies have made artificial intelligence a key technology in their companies.
Significant advances in open source AI over the past few years, such as groundbreaking TensorFlow The framework has opened up AI to a wider audience and made the technology more accessible. The relatively friction-free use of the new technology has greatly accelerated the adoption of new applications and caused an explosion. Tesla AutopilotAmazon Alexa and other familiar uses have captured both our imagination and caused controversybut AI finds applications in almost every part of our world.
The parts that make up the AI puzzle
Historically, machine learning (ML) – the path to artificial intelligence – is designed for academics and professionals with the necessary mathematical skills to develop complex algorithms and models. Today, information scientists working on these projects need both the necessary knowledge and the right tools to effectively produce machine learning models for scale consumption – which can often be a very complex task involving complex infrastructure and many steps in ML workflows. can. .
Another key part is the life cycle management (MLM) of the model, which manages a complex AI pipeline and helps ensure results. Private MLM systems of the past were expensive, but still lagged far behind the latest technological advances in artificial intelligence.
Effectively filling this operational capability gap is critical to the long-term success of AI programs, as training models that give good predictions are only a small part of the overall problem. Building ML systems that add value to an organization is more than that. Instead of the send-and-forget pattern that is typical of traditional software, an effective strategy requires regular iteration cycles with continuous monitoring, care, and improvement.
Enter MLops (machine learning operations), which allows information scientists, engineering and IT operations teams to work together to put ML models into production, manage them on a scale, and continuously monitor their performance.
The main problems for AI in production
MLops typically aims to address six key issues related to the application of artificial intelligence applications in production. These are: repeatability, availability, sustainability, quality, scalability and consistency.
In addition, MLops can help simplify the consumption of artificial intelligence so that applications can use machine learning models to draw conclusions (i.e., to make data-based predictions) in a scalable, sustainable way. This capability is, above all, a core value that AI initiatives must deliver. To go deeper:
Repetition ability It is a process that ensures that the ML model works successfully in a repetitive manner.
Availability It states that the ML model is positioned in a way that is sufficiently accessible to provide extract services to consumer applications and offer an appropriate level of service.
Sustainability Refers to processes that allow the ML model to remain sustainable on a long-term basis; for example, it becomes necessary when redesigning a model.
Quality: The ML model is continuously monitored to ensure tolerable quality predictions.
Scalability means the scale of both the inference services and the people and processes required to redesign the ML model when required.
Sequence: A consistent approach to ML is essential to ensure success in the other measures outlined above.
We can think of MLops as a natural extension of flexible devos applied to AI and ML. Typically, MLops covers key aspects of the machine learning life cycle – data pre-processing (data reception, analysis, and preparation – and making sure the data is appropriate for the model to be taught), model development, model training and validation, and finally deployment.
The following six proven MLops techniques can measure the effectiveness of artificial intelligence initiatives in terms of marketing time, results, and long-term sustainability.
1. ML pipelines
ML pipelines typically consist of several steps arranged in a directed acyclic schedule (DAG) that coordinates the flow of training data, as well as the creation and delivery of trained ML models.
The steps in the ML pipeline can be complicated. For example, one step in obtaining data may require several sub-tasks to collect a set of data, perform checks, and perform transformations. For example, data may need to be extracted from a variety of source systems – perhaps data markets, web crawlers, geographic stores, and APIs in a corporate database. The extracted data can then be subjected to quality and integrity checks using sampling methods and adapted in a variety of ways – combinations such as discarding unnecessary data points, aggregating or windowing other data points, and so on.
Converting data into a format that can be used to teach the machine learning ML model – a process called feature engineering – can benefit from additional adaptation steps.
Training and test models often require a network search to find the optimal hyperparameters, where numerous experiments are performed in parallel until the best set of hyperparameters is determined.
Storing models requires an effective approach to the version and a method for capturing relevant metadata and dimensions about the model.
MLops platforms such as KubeflowA set of open source machine learning tools at Kubernetes provides a cloud-native but platform-agnostic interface for component steps, transforming the complex steps that make up the workflow of information science into work on Docker containers at Kubernetes. ML pipelines.
2. Result services
Once an appropriately trained and approved model has been selected, the model should be placed in a production environment where live data is available to prepare forecasts.
The good news is that the model architecture as a service has made this aspect of ML much easier. This approach separates the application from the model through the API, further simplifying processes such as modeling, repositioning, and reuse.
3. Continuous placement
When significant model slippage is detected, it is important to be able to automatically redesign and reposition ML models.
In the native world of clouds, Native offers a powerful open source platform for creating serverless applications and can be used to run MLops pipelines running on another open source work planner such as Kubeflow or Apache. Air flow.
4. Blue-green placements
With solutions like Seldon Core, it can be useful to create an ML placement with two forecasters – for example, allocating 90% of traffic to an existing (“champion”) forecaster and 10% to a new (“pitch”) forecaster. The MLops team can then (ideally automatically) monitor the quality of the forecasts. Once proven, the placement can be updated to transfer all traffic to the new predictor. On the other hand, if a new forecaster performs worse than an existing forecaster, 100% of the traffic can be transferred to the old forecaster instead.
5. Automatic slip detection
As production data changes over time, model performance may deviate from the original figure due to significant changes in new data against data used in training and model validation. This can significantly damage the quality of forecasting.
Like drift detectors Seldon Alibi Detect can be used to automatically evaluate model performance over time and trigger the model redesign process and automatic displacement.
6. Feature stores
These are These are databases optimized for ML. Feature shops allow information scientists and information engineers to reuse and collaborate on a set of data designed for machine learning – the so-called “features.” The development of functions can be labor-intensive, and access to a set of feature data developed within information science groups can significantly speed up market access while improving the quality and consistency of the overall machine learning model. RECEPTION is such an open source features store that describes itself as “the fastest way to process analytics data for model training and online conclusions.”
By adopting the MLops paradigm for the data lab and taking into account the six sustainability measures – repeatability, accessibility, sustainability, quality, scalability and consistency – organizations and departments can measure and sustain the information team productivity, the long-term success of the AI project. effectively maintain their competitive advantage.
Rob Gibbon is an information platform for Canonical publishers and product manager for MLops Ubuntu.
Welcome to the VentureBeat community!
DataDecisionMakers is a place where professionals, including data technicians, can share ideas and innovations about information.
If you want to read about cutting-edge ideas and the latest data, best practices, the future of data and information technology, join us at DataDecisionMakers.
You can even think contributes to the article own!