AI Engineering

MLOps: From Model to Production

Baljeet Dogra Baljeet Dogra
22 min read

Getting a model to work in a Jupyter notebook is just the beginning. MLOps—Machine Learning Operations—is the discipline of deploying, monitoring, and maintaining ML models in production. This guide covers the essential practices for taking models from development to production reliably.

What is MLOps?

MLOps is the practice of applying DevOps principles to machine learning. It combines ML development (data science, model training) with operations (deployment, monitoring, maintenance) to create reliable, scalable ML systems.

The goal of MLOps is to:

  • Deploy models reliably—automated, repeatable, and safe deployments
  • Monitor model performance—detect drift, degradation, and issues in real-time
  • Version everything—models, data, code, and configurations
  • Enable rapid iteration—test, deploy, and rollback quickly
  • Maintain model quality—ensure models continue to perform well over time

The MLOps Lifecycle

MLOps follows a continuous cycle:

1. Develop

Data preparation, feature engineering, model training, experimentation

2. Test

Unit tests, integration tests, model evaluation, validation

3. Deploy

Model packaging, containerisation, deployment to staging/production

4. Monitor

Performance tracking, drift detection, error monitoring, cost tracking

Continuous loop: Monitoring triggers retraining, which leads back to development

1. Model Versioning

Version control isn't just for code—it's essential for models, data, and configurations. Versioning enables reproducibility, rollbacks, and audit trails.

What to Version

Models

Store model artifacts (weights, architecture, metadata) with unique versions

Tools: MLflow, Weights & Biases, DVC, S3 with versioning

Data

Version training datasets, feature stores, and data pipelines

Tools: DVC, Git LFS, data versioning systems

Code

Training scripts, preprocessing code, inference code, configurations

Tools: Git, with proper branching strategies

Configurations

Hyperparameters, feature flags, environment configs

Store alongside code or in dedicated config management systems

Best Practices for Model Versioning

  • Use semantic versioning: Major.Minor.Patch (e.g., v1.2.3) for model releases
  • Link versions: Link model version to code commit, data version, and config version
  • Store metadata: Training metrics, dataset info, hyperparameters, environment details
  • Tag production models: Clearly mark which versions are in production
  • Enable rollback: Keep previous versions accessible for quick rollback

2. Model Deployment Strategies

How you deploy models depends on your requirements. Here are the main strategies:

2.1 Batch Inference

Process predictions in batches, typically on a schedule (hourly, daily, weekly).

  • Use cases: Recommendations, scoring, reporting, ETL pipelines
  • Pros: Efficient resource usage, easier to scale, cost-effective
  • Cons: Not real-time, requires batch infrastructure

Example: Daily batch job that scores all customers for churn risk, then updates database.

2.2 Real-Time Inference (Online)

Serve predictions on-demand via API endpoints, typically with low latency requirements.

  • Use cases: Chatbots, fraud detection, recommendations, personalisation
  • Pros: Immediate results, interactive applications
  • Cons: Higher infrastructure costs, requires low-latency infrastructure

Example: REST API that returns product recommendations within 100ms of a user request.

2.3 Streaming Inference

Process predictions on streaming data in near real-time.

  • Use cases: Real-time fraud detection, anomaly detection, live recommendations
  • Pros: Real-time insights, handles high-volume streams
  • Cons: Complex infrastructure, requires stream processing expertise

Example: Kafka stream processing that scores transactions for fraud as they occur.

3. Deployment Patterns

Choose deployment patterns that minimise risk and enable safe rollouts:

3.1 Blue-Green Deployment

Run two identical production environments. Deploy new model to "green", test, then switch traffic. If issues occur, instantly switch back to "blue".

Best for: Zero-downtime deployments, easy rollback, critical production systems.

3.2 Canary Deployment

Gradually roll out new model to a small percentage of traffic (e.g., 5%), monitor performance, then gradually increase if successful. Roll back if issues detected.

Best for: Testing new models safely, gradual rollouts, A/B testing.

3.3 Shadow Mode

Run new model in parallel with production model, but don't use its predictions. Compare outputs to validate performance before switching.

Best for: Validating new models, comparing performance, risk-free testing.

4. Model Serving Infrastructure

Choose the right infrastructure for serving your models:

4.1 Containerisation

Package models in containers (Docker) for consistent deployment across environments:

  • • Include model, dependencies, and inference code
  • • Ensures consistency between dev, staging, and production
  • • Enables easy scaling and deployment
  • • Works with Kubernetes, ECS, or any container orchestration

4.2 Model Serving Frameworks

Use specialised frameworks for efficient model serving:

  • TensorFlow Serving: For TensorFlow models, optimised for production
  • TorchServe: For PyTorch models
  • MLflow Models: Framework-agnostic model serving
  • Triton Inference Server: NVIDIA's multi-framework serving platform
  • Custom APIs: FastAPI, Flask, or gRPC for custom serving logic

4.3 Serverless Options

For variable or low-volume workloads, consider serverless:

  • • AWS SageMaker, Google Cloud AI Platform, Azure ML
  • • AWS Lambda, Google Cloud Functions (for lightweight models)
  • • Pay per request, auto-scaling, no infrastructure management

5. Model Monitoring

Monitoring is critical. Models degrade over time, and you need to detect issues before they impact users.

5.1 Performance Metrics

Track model performance in production:

Prediction Metrics

  • • Accuracy, precision, recall, F1
  • • Prediction confidence scores
  • • Prediction distribution

Operational Metrics

  • • Latency (P50, P95, P99)
  • • Throughput (requests/second)
  • • Error rates
  • • Resource usage (CPU, memory, GPU)

5.2 Data Drift Detection

Monitor for data drift—when production data differs from training data:

  • Feature drift: Distribution of input features changes
  • Concept drift: Relationship between features and target changes
  • Detection methods: Statistical tests (KS test, PSI), distribution comparisons, model confidence monitoring

Tools: Evidently AI, Fiddler, Aporia, or custom drift detection pipelines.

5.3 Model Health Monitoring

Monitor overall model health:

  • Prediction quality: Compare predictions to ground truth (when available)
  • Anomaly detection: Flag unusual prediction patterns
  • Business metrics: Track downstream business impact (conversion rates, revenue, etc.)
  • Alerting: Set up alerts for performance degradation, drift, or errors

6. CI/CD for ML

Automate your ML pipeline with continuous integration and deployment:

CI/CD Pipeline Stages

1. Code Quality Checks

Linting, type checking, code formatting, security scans

2. Unit & Integration Tests

Test data processing, feature engineering, model training logic

3. Model Training & Validation

Train model, run evaluation tests, check performance thresholds

4. Model Packaging

Package model, dependencies, and metadata

5. Deployment

Deploy to staging, run smoke tests, deploy to production

7. Model Retraining & Updates

Models need regular updates. Plan for retraining:

7.1 Retraining Triggers

  • Scheduled: Retrain weekly, monthly, or on a fixed schedule
  • Performance-based: Retrain when metrics drop below threshold
  • Drift-based: Retrain when data drift is detected
  • Data-based: Retrain when new labelled data is available

7.2 Automated Retraining Pipelines

Automate the retraining process:

  • • Fetch latest data
  • • Run data validation checks
  • • Train new model version
  • • Evaluate against validation set
  • • Compare to current production model
  • • Deploy if better, or alert if worse

8. Best Practices Summary

Version Everything

Models, data, code, and configurations. Enable reproducibility and rollback.

Automate Everything

Training, testing, deployment, and monitoring. Reduce manual errors and speed up iteration.

Monitor Continuously

Track performance, drift, errors, and costs. Set up alerts for anomalies.

Test Thoroughly

Unit tests, integration tests, model evaluation, and staging environment validation.

Deploy Safely

Use canary or blue-green deployments. Enable quick rollback. Test in staging first.

Document Everything

Model cards, deployment runbooks, monitoring dashboards, and incident response procedures.

Common MLOps Tools

Model Management

  • • MLflow
  • • Weights & Biases
  • • DVC
  • • Kubeflow

Model Serving

  • • TensorFlow Serving
  • • TorchServe
  • • Triton Inference Server
  • • Seldon Core

Monitoring

  • • Evidently AI
  • • Fiddler
  • • Aporia
  • • Custom dashboards

Orchestration

  • • Airflow
  • • Prefect
  • • Kubeflow Pipelines
  • • MLflow Pipelines

Conclusion

MLOps is essential for deploying and maintaining ML models in production. The key principles are:

  • Version control: Track models, data, code, and configs for reproducibility
  • Automation: CI/CD pipelines for testing, training, and deployment
  • Monitoring: Track performance, detect drift, and alert on issues
  • Safe deployment: Use canary or blue-green patterns for risk-free rollouts
  • Continuous improvement: Retrain models regularly and iterate based on monitoring

Start simple—version your models, set up basic monitoring, and automate deployment. Then gradually add more sophisticated MLOps practices as your needs grow. The goal is reliable, maintainable ML systems that deliver value consistently.

Need Help Setting Up MLOps?

If you're looking to deploy ML models to production or improve your MLOps practices, I can help with model versioning, deployment pipelines, monitoring setup, and CI/CD automation. Let's discuss your requirements.

Get in Touch