Sandra Wagner Writing Sample
Technical Documentation

Get Started with MLflow Tracking

Train two models, log both to MLflow, and compare the results in the UI — using autologging and manual logging side by side.

30–60 min Python · scikit-learn · MLflow Annotated version — coming soon

In this tutorial, we'll train two models, log both to MLflow, and explore what was captured in the MLflow UI.

In Part 1, we'll use autologging to capture a classification model automatically. In Part 2, we'll log a regression model by hand so we can see exactly what gets recorded and why. In Part 3, we'll compare both runs side by side in the UI.

Part 1 uses the Iris dataset, a sample dataset included with scikit-learn, and trains a logistic regression model — a standard algorithm for classification tasks. Part 2 uses the diabetes dataset, also included with scikit-learn, and trains a random forest regressor.

By the end, we'll have two logged training runs in the MLflow UI, a clear understanding of the difference between autologging and manual logging, and a comparison view showing how the two approaches differ in what they capture.

Before you begin

⏱ Time required: approximately 30 to 60 minutes

Part 1Log a run with autologging

Autologging is the fastest way to get a training run into MLflow. A single call to mlflow.autolog() before training is all it takes: MLflow captures parameters, metrics, and the trained model automatically.

1Write and run the script

MLflow organizes training runs into experiments. We'll name one to hold our runs. If the experiment doesn't exist yet, MLflow creates it automatically — or connects to an existing one and returns no output either way.

Create a new .py file and add the following code.

Note: The set_tracking_uri line is required when logging to a local MLflow server. Update the port number if your server is running on something other than 5000.
import mlflow
mlflow.set_tracking_uri('http://127.0.0.1:5000')
mlflow.set_experiment("MLflow Quick Start")

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

params = {"solver": "lbfgs", "max_iter": 1000, "random_state": 8888}

mlflow.autolog()

lr = LogisticRegression(**params)
lr.fit(X_train, y_train)

predictions = lr.predict(X_test)

mlflow.autolog() enables automatic logging for scikit-learn. When lr.fit() runs, MLflow captures the hyperparameters, performance metrics, and the trained model without any additional logging code.

Expected result: The script completes without errors. MLflow logs the run in the background. We'll confirm this in the next step.

2Start the MLflow UI

The MLflow UI lets us explore logged runs, compare metrics, and inspect saved models.

Note: Keep the terminal running while you use the UI. Closing it stops the server.

If the server isn't already running, open a terminal and run:

mlflow server --port 5000

Then open http://127.0.0.1:5000 in a browser.

Expected result: The MLflow UI loads and shows Experiments in the left panel.

3Find your run

Now we'll locate the run we logged in Step 1 and review what MLflow captured automatically.

  1. Select MLflow Quick Start from the Experiments list.
  2. In the Runs table, select the run listed there.
  3. On the Overview page, scroll down to review the Parameters, Metrics, and Logged model sections.
Expected result: The Parameters section shows solver, max_iter, and random_state. The Metrics section shows accuracy. The Logged model section shows the saved model artifact.

Part 2Log a run manually

Manual logging gives us control over exactly what gets captured and when. We'll log a second run using the diabetes dataset, this time writing each logging call instead of relying on autologging. Each call records a specific aspect of the run. See MLflow Tracking API for full details.

1Write and run the script

Create a new .py file in the same directory as your Part 1 script and add the following code.

Note: The set_tracking_uri line is required when logging to a local MLflow server. Update the port number if your server is running on something other than 5000.
import mlflow
mlflow.set_tracking_uri('http://127.0.0.1:5000')
mlflow.set_experiment("MLflow Quick Start")

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

dataset = mlflow.data.from_numpy(db.data, targets=db.target, name="diabetes")

params = {"n_estimators": 100, "max_depth": 6, "max_features": 3}

with mlflow.start_run():
    mlflow.log_input(dataset, context="training")
    mlflow.log_params(params)
    rf = RandomForestRegressor(**params)
    rf.fit(X_train, y_train)
    predictions = rf.predict(X_test)
    mse = mean_squared_error(y_test, predictions)
    mlflow.log_metric("mse", mse)
    mlflow.sklearn.log_model(sk_model=rf, name="diabetes_model")

print("script completed")
Expected result: The terminal prints script completed. The run appears in the MLflow UI under the MLflow Quick Start experiment with parameters, a metric, a dataset, and a saved model.

2View the run in the UI

If you closed the browser, open http://127.0.0.1:5000. If it's still open, refresh the page. Navigate to the MLflow Quick Start experiment.

  1. Select the most recent run in the table.
  2. Review the Metrics section: mse should appear.
  3. Review the Parameters section: n_estimators, max_depth, and max_features should appear.
  4. Review the Datasets section: the diabetes dataset should appear.
  5. Scroll to the Model section and confirm the diabetes_model artifact is present.
Expected result: All four elements — metrics, parameters, datasets, and model — appear in the run Overview page. Unlike the autologged run from Part 1, every value here was captured by code that we wrote.

Part 3Compare runs in the UI

We already have two runs in the MLflow Quick Start experiment: one logged automatically in Part 1, one logged manually in Part 2.

1Select both runs

Open http://127.0.0.1:5000 (or refresh if it's already open) and navigate to the MLflow Quick Start experiment.

  1. Check the box next to the Part 1 run.
  2. Check the box next to the Part 2 run.
  3. Select Compare above the runs table.
Expected result: The comparison view opens with both runs displayed side by side.

2Review the comparison

The comparison view shows parameters, metrics, and datasets for each run in parallel. We'll use it to see exactly what each logging approach captured.

  1. Review the Parameters section. The Part 1 run shows solver, max_iter, and random_state. The Part 2 run shows n_estimators, max_depth, and max_features. The parameter sets are different because the two runs used different models.
  2. Review the Metrics section. Part 1 logged accuracy. Part 2 logged mse. Again, different models, different metrics.
  3. Review the Dataset section. Part 1 shows two datasets (training and test splits, captured automatically). Part 2 shows one dataset, logged explicitly with mlflow.log_input().

The comparison view also includes several visualization options. The default table view shows parameters and metrics side by side. The parallel coordinates plot maps each run as a line across parameter and metric axes, useful for spotting patterns across many runs. Charts are also available for plotting individual metrics.

Expected result: The comparison view reflects the difference between autologging and manual logging — not just different values, but different structures.

What you built

We trained and logged two models to MLflow: one classification model using autologging, and one regression model using manual logging. We then compared both runs in the MLflow UI to see how the two logging approaches differ in what they capture and how they structure it.

This means we can now track every training run in one place, control exactly what gets captured, and evaluate the differences between runs without leaving the UI.

What's next

Now that you've completed this tutorial, you're ready to: