A Guide to CI/CD for Machine Learning | Nitor Infotech
Send me Nitor Infotech's Monthly Blog Newsletter!
×
nitor logo
  • Company
    • About
    • Leadership
    • Partnership
  • Resource Hub
  • Blog
  • Contact
nitor logo
Add more content here...
Artificial intelligence Big Data Blockchain and IoT
Business Intelligence Careers Cloud and DevOps
Digital Transformation Healthcare IT Manufacturing
Mobility Product Modernization Software Engineering
Thought Leadership
Aastha Sinha Abhijeet Shah Abhishek Suranglikar
Abhishek Tanwade Abhishek Tiwari Ajinkya Pathak
Amit Pawade Amol Jadhav Ankita Kulkarni
Antara Datta Anup Manekar Chandra Gosetty
Chandrakiran Parkar Dr. Girish Shinde Gaurav Mishra
Gaurav Rathod Harshali Chandgadkar Kapil Joshi
Madhavi Pawar Marappa Reddy Milan Pansuriya
Minal Doiphode Mohit Agarwal Mohit Borse
Nalini Vijayraghavan Neha Garg Nikhil Kulkarni
Omkar Ingawale Omkar Kulkarni Pranit Gangurde
Prashant Kamble Prashant Kankokar Priya Patole
Rahul Ganorkar Ramireddy Manohar Ravi Agrawal
Robin Pandita Rohini Wwagh Sachin Saini
Sadhana Sharma Sambid Pradhan Sandeep Mali
Sanjeev Fadnavis Saurabh Pimpalkar Sayanti Shrivastava
Shardul Gurjar Shravani Dhavale Shreyash Bhoyar
Shubham Kamble Shubham Muneshwar Shweta Chinchore
Sidhant Naveria Sreenivasulu Reddy Sujay Hamane
Tejbahadur Singh Tushar Sangore Vasishtha Ingale
Veena Metri Vidisha Chirmulay Yogesh Kulkarni
Artificial intelligence | 13 Jan 2023 |   14 min

A Guide to CI/CD for Machine Learning

featured image

Before jumping into CI/CD, let’s look at the motivation behind it. For example, your company is creating an ML-powered service. Being a data scientist, you always try to continuously update the existing ML model.

If you train a better model, how can you say for sure that the service won’t break when you deploy the new model?

Have you not considered having a workflow that automatically tests a pull request from a team member, then merges a pull request when all tests are passed and deploys the ML model to the existing service?

A Guide to CI_CD for Machine Learning 1 Nitor Infotech

You can create such a workflow with GitHub Actions by creating a CI/CD pipeline.

Why is CI/CD important?

  • It helps in eliminating the need for the data scientist in the process of ML pipeline again and again (building>testing>deploying) by automating it.
  • The working solution can be delivered to users quickly and frequently with constant improvements
  • A continuous feedback loop can be created using CI/CD, to keep models up to date and perfect without constant monitoring, interference, or attention.
  • Data scientists will be able to focus more on the core implementation rather than the DevOps work.

CI/CD/CT

Changes can be made by development stage automation using CI/CD. In ML, a model depends on data and hyperparameters and deploying a model to production. CI/CD for software development is straightforward and probably the CI/CD pipeline won’t change a lot in the development phase. The same is not true for ML. Over time, a lot of changes can be seen in ML pipeline building.

CI is also about testing and validating data and models along with testing and validating code and components.

CD is about a system that automatically deploys model prediction service instead of a single software package or service.

CT is new and unique to ML systems, which looks upon automatically retraining and serving the models.

Notebooks are used while starting a ML project, and once a base code of different stages of the ML pipeline is built, the code is pushed to a version control system, and the pipeline is shifted to a CI/CD tool such as CML or Jenkins or TravisCI.

Let’s jump onto the basic tools used for having a ML pipeline.

GitHub Actions and Continuous Machine Learning (CML)

The above two concerns are handled by Continuous Machine Learning (CML).

GitHub Actions make it faster to build, test and deploy your code by automating your workflows. There’s an integrated solution in GitHub called GitHub Actions to build and develop our workflows and then automate the process. To start the workflows using Actions, the events like push, pull request, releases, etc. are utilized as triggers.

Reports can be generated in pull requests with metrics and plots using CML which can help the team to make correct data-driven decisions. Git Actions has a CML plugin. The YAML file consists of CI/CD pipeline and is committed to GitHub along with the code. You can refer to the documentation for creating a simple pipeline.

This blog will demonstrate the set up for running some stages of an ML pipeline, that combines the best models for making a Voting Classifier that predicts if a person will suffer from stroke or not based on past medical history. The models are built using scikit-learn and the dataset is from Kaggle i.e., Healthcare Stroke Prediction dataset. The aim is to run the stages of an ML pipeline using CML while observing the outcome of the pipeline.

Demonstration of a CI/CD Pipeline

Firstly, the data is processed like Numpy, Pandas and Seaborn. After all the preprocessing is done, this preprocessed data is stored as another csv file in the same GitHub repository. Splitting data in an 80-20 manner and then transforming using Standard Scaler, and building models using algorithms like Random Forest, Gradient Boosting, KNN, Decision Tree and Logistic Regression is the first thing to do. These models are evaluated to calculate accuracy, specificity and sensitivity. At the end, after every code commit, observe these metrics and the distribution of the errors. In this case, your false negatives should be less i.e., sensitivity or recall should be greater.

Let’s jump onto our demonstration.

Create a YAML file in “.github/workflows” so that GitHub Actions recognizes that there is a pipeline to run. If you have a workflow in your commit, GitHub Actions will automatically run that workflow. CML provides a docker image with Python3 and the CML library to use the CML capabilities, but we can use Marketplace for Actions to have Python3 and CML.

For this example, preprocess script is being run, then the output data of this script is saved in repository and passed in the training script to create a model out of it. These scripts run one by one using the YAML file.

A Guide to CI_CD for Machine Learning 2 Nitor Infotech

name: – The name of the workflow.

on: [push] – Trigger of the workflow. Our example uses the push event, so when someone pushes a change to repository or merger a pull request, a workflow run will be triggered.

job: – Encapsulate all the jobs that run in the model-stroke workflow.

runs-on: – job is configured to run on the latest OS, which means that the job will run on a fresh virtual machine.

steps: – this will encapsulate all the steps that run under “run” job (line no. 4).

uses: actions/checkout@v2 – specifies that v2 of the actions/checkout action will run. This action goes through your repository which allows you to run scripts or build or test tools.

uses: actions/setup-cml@v1 – In this step, specific version of cml (here it’s v1) will be installed using actions/setup-cml@v1.

run: – the specified command will be executed on the runner

cml publish [options] <image file> – for publishing and displaying an image in a CML report. “>> report.md” does the work of providing an image in a markdown file. Options mentioned: “–md” for producing output in markdown file. “–title=<…>” for title of markdown output.

cml send-comment [options] <markdown report file> – Post a Markdown report as a comment on a commit or pull/merge request.

Following is the metrics code snippet, which is responsible for offline evaluation metrics of Voting Classifier.

A Guide to CI_CD for Machine Learning 3 Nitor Infotech

The above code changes are made, and the files are committed to the repository. Also you need to save the evaluated metrics in a file to display it using CML. A run of the workflow is started in GitHub Actions by detecting our YAML configuration.

A Guide to CI_CD for Machine Learning 4 Nitor Infotech

When you click on any workflow run, you’ll see some steps that are run to setup a job, then run actions or setup CML to install everything you need to run the pipeline and check out the latest code. Then, the task of processing and training in our pipeline is run. At last, after completing our task, all resources are cleaned up.

A Guide to CI_CD for Machine Learning 5 Nitor Infotech

When the job is completed, GitHub Actions in coordination with CML posts a comment and metrics image to the pull request conversation tab. These correspond to the CML commands that were written in the workflow YAML file after executing of the Python script.

A Guide to CI_CD for Machine Learning 6 Nitor Infotech

A Guide to CI_CD for Machine Learning 7 Nitor Infotech

By doing this, whenever we make changes to the dataset/ pipeline/ ML logic, we will get a similar response in the pull request conversation tab only if we pass all checks. If the dataset is not present in GitHub repository, then CML can retrieve the dataset using DVC (Data Version Control) or pull it from any cloud storage.

This pipeline that I demonstrated can be expanded to deploy a docker container to a virtual machine and serve an API.

That’s why it makes a lot of sense to have a workflow that automatically tests a pull request from a team member and deploys the ML model to the existing service!

Write to us at Nitor Infotech with your thoughts about the blog you just read, or if you’d like to learn more about our cognitive engineering services.

Related Topics

Artificial intelligence

Big Data

Blockchain and IoT

Business Intelligence

Careers

Cloud and DevOps

Digital Transformation

Healthcare IT

Manufacturing

Mobility

Product Modernization

Software Engineering

Thought Leadership

<< Previous Blog fav Next Blog >>
author image

Shubham Kamble

Trainee Software Engineer

Shubham is a Trainee Software Engineer at Nitor Infotech. As an AI/ML enthusiast, he is eager to learn how every new AI technology works behind the scenes. Coding with coffee is the only thing he loves. Working on projects related to images, video, and audio with a deep understanding of algorithms and cloud services enabled him to be proficient in the AI/ML space. "Learning never exhausts the mind" is his mantra and that's what makes him versatile. He is passionate about strength-training and travelling.

   

You may also like

featured image

A Complete Guide to Monitoring Machine Learning Models: Part 2

In the first part of this series, I introduced you to the monitoring of machine learning models, its types, and real-world examples of each one of those. You can read Read Blog


featured image

Building and Managing AI Frameworks

I’m sure you would concur when I say that reliable AI is well on its way to becoming a vital requirement in today’s business landscape. Its features of fairness, explainability, robustness, data li...
Read Blog


featured image

Top 4 Types of Sentiment Analysis

When you’re analyzing what works for your business and what doesn’t, you deal with two types of data- objective, tangible data that you collate from surveys, feedback, and reviews, and then there’s...
Read Blog


subscribe

Subscribe to our fortnightly newsletter!

We'll keep you in the loop with everything that's trending in the tech world.

Services

    Modern Software Engineering


  • Idea to MVP
  • Quality Engineering
  • Product Engineering
  • Product Modernization
  • Reliability Engineering
  • Product Maintenance

    Enterprise Solution Engineering


  • Idea to MVP
  • Strategy & Consulting
  • Enterprise Architecture & Digital Platforms
  • Solution Engineering
  • Enterprise Cognition Engineering

    Digital Experience Engineering


  • UX Engineering
  • Content Engineering
  • Peer Product Management
  • RaaS
  • Mobility Engineering

    Technology Engineering


  • Cloud Engineering
  • Cognitive Engineering
  • Blockchain Engineering
  • Data Engineering
  • IoT Engineering

    Industries


  • Healthcare
  • Retail
  • Manufacturing
  • BFSI
  • Supply Chain

    Company


  • About
  • Leadership
  • Partnership
  • Contact Us

    Resource Hub


  • White papers
  • Brochures
  • Case studies
  • Datasheet

    Explore More


  • Blog
  • Career
  • Events
  • Press Releases
  • QnA

About


With more than 16 years of experience in handling multiple technology projects across industries, Nitor Infotech has gained strong expertise in areas of technology consulting, solutioning, and product engineering. With a team of 700+ technology experts, we help leading ISVs and Enterprises with modern-day products and top-notch services through our tech-driven approach. Digitization being our key strategy, we digitally assess their operational capabilities in order to achieve our customer's end- goals.

Get in Touch


  • +1 (224) 265-7110
  • marketing@nitorinfotech.com

We are Social 24/7


© 2023 Nitor Infotech All rights reserved

  • Terms of Usage
  • Privacy Policy
  • Cookie Policy
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Accept Cookie policy