Deploying a scikit-learn Model as a REST API Using FastAPI and Docker

Feb 16, 2026

Training a machine learning model is only part of the job. To use that model in real applications, it needs to be deployed in a reliable and reproducible way. Docker solves this by packaging your application, dependencies, and environment into a single container.

In this article, we’ll deploy a scikit-learn model as a REST API using FastAPI and Docker, making it easy to run anywhere without dependency issues.

What we will build includes;

Train a simple scikit-learn model
Serve it using FastAPI
Dockerize the application
Run the API inside a container
Test predictions through the API

Prerequisites

You’ll need:

Basic Python knowledge
Python 3.8+
Docker installed

Libraries used:

scikit-learn
FastAPI
Uvicorn
joblib

Project Structure

Our final project structure will look like this

ml-api/

├── main.py

├── train_model.py

├── model.joblib

├── requirements.txt

└── Dockerfile

Step 1: Train and Save the Model

Create train_model.py:

from sklearn.datasets import load_iris

from sklearn.ensemble import RandomForestClassifier

from joblib import dump

data = load_iris()

X, y = data.data, data.target

model = RandomForestClassifier()

model.fit(X, y)

dump(model, "model.joblib")

print("Model saved successfully")

Run it once

python train_model.py

This creates model.joblib, which will be loaded by the API.

Step 2: Create the FastAPI App

Create main.py:

from fastapi import FastAPI

from pydantic import BaseModel

from joblib import load

app = FastAPI(title="Dockerized ML API")

model = load("model.joblib")

class IrisInput(BaseModel):

    sepal_length: float

    sepal_width: float

    petal_length: float

    petal_width: float

@app.get("/")

def health_check():

    return {"status": "API is running"}

@app.post("/predict")

def predict(data: IrisInput):

    features = [[

        data.sepal_length,

        data.sepal_width,

        data.petal_length,

        data.petal_width

]]

    prediction = model.predict(features)[0]

    return {"prediction": int(prediction)}

Step 3: Define Dependencies

Create requirements.txt:

fastapi

uvicorn

scikit-learn

joblib

Step 4: Create the Dockerfile

Create a file called Dockerfile (no extension):

FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

What this Dockerfile does

Uses a lightweight Python image
Installs dependencies
Copies application files
Runs the FastAPI app using Uvicorn

Step 5: Build the Docker Image

From the project directory:

docker build -t ml-fastapi .

Step 6: Run the Container

docker run -p 8000:8000 ml-fastapi

The API is now running inside Docker.

Step 7: Test the API

Open your browser:

http://localhost:8000

You should see:

{"status":"API is running"}

Swagger UI

Visit:

http://localhost:8000/docs

Test the /predict endpoint with:

  "sepal_length": 6.1,

  "sepal_width": 2.8,

  "petal_length": 4.7,

  "petal_width": 1.2

You’ll receive a prediction response from the containerized model.

Why Docker Is Important for ML APIs?

Docker helps solve common ML deployment problems:

Dependency conflicts
Environment differences
Inconsistent runtime behavior
Hard-to-reproduce bugs

With Docker, the same image runs locally, on servers, and in cloud platforms without changes.

Common Issues and Fixes

Container can’t find the model file

Ensure model.joblib is copied into the image
Verify COPY . . exists in the Dockerfile

API not accessible

Confirm --host 0.0.0.0 is set
Check port mapping -p 8000:8000

Next Steps

To move closer to production, you can:

Push the Docker image to a registry
Deploy to a cloud VPS
Add logging and monitoring
Implement authentication
Version models inside containers

Conclusion

Dockerizing a FastAPI-based ML service makes deployment reliable and portable. With this setup, a scikit-learn model can be packaged into a single container and run consistently across environments.

This approach is ideal for ML prototypes, internal services, and scalable deployments when combined with cloud infrastructure.

Tesla’s Substack

Discussion about this post

Ready for more?