Vertex AI Model Monitoring for batch predictions

This page describes how to configure batch prediction job requests to include one-time Model Monitoring analysis. For batch predictions, Model Monitoring supports feature skew detection for categorical and numerical input features.

To create a batch prediction job with Model Monitoring skew analysis, you must include both your batch prediction input data and original training data for your model in the request. You can only add Model Monitoring analysis when creating new batch prediction jobs.

For more information about skew, see Introduction to Model Monitoring.

For instructions on how to set up Model Monitoring for online (real-time) predictions, see Using Model Monitoring.

Prerequisites

To use Model Monitoring with batch predictions, complete the following:

  1. Have an available model in Vertex AI Model Registry that is either a tabular AutoML or tabular custom training type.

  2. Upload your training data to Cloud Storage or BigQuery and obtain the URI link to the data.

    • For models trained with AutoML, you can use the dataset id for your training dataset instead.
  3. Model Monitoring compares the training data to the batch prediction output. Make sure you use supported file formats for the training data and batch prediction output:

    Model type Training data Batch prediction output
    Custom-trained CSV, JSONL, BigQuery, TfRecord(tf.train.Example) JSONL
    AutoML tabular CSV, JSONL, BigQuery, TfRecord(tf.train.Example) CSV, JSONL, BigQuery, TfRecord(Protobuf.Value)
  4. Optional: For custom-trained models, upload the schema for your model to Cloud Storage. Model Monitoring requires the schema to calculate the baseline distribution for skew detection.

Request a batch prediction

You can use the following methods to add Model Monitoring configurations to batch prediction jobs:

Console

Follow the instructions to make a batch prediction request with Model Monitoring enabled:

REST API

Follow the instructions to make a batch prediction request using the REST API:

When you create the batch prediction request, add the following Model Monitoring configuration to the request JSON body:

"modelMonitoringConfig": {
 "alertConfig": {
   "emailAlertConfig": {
     "userEmails": "EMAIL_ADDRESS"
   },
  "notificationChannels": [NOTIFICATION_CHANNELS]
 },
 "objectiveConfigs": [
   {
     "trainingDataset": {
       "dataFormat": "csv",
       "gcsSource": {
         "uris": [
           "TRAINING_DATASET"
         ]
       }
     },
     "trainingPredictionSkewDetectionConfig": {
       "skewThresholds": {
         "FEATURE_1": {
           "value": VALUE_1
         },
         "FEATURE_2": {
           "value": VALUE_2
         }
       }
     }
   }
 ]
}

where:

  • EMAIL_ADDRESS is the email address where you want to receive alerts from Model Monitoring. For example, example@example.com.

  • NOTIFICATION_CHANNELS: a list of Cloud Monitoring notification channels where you want to receive alerts from Model Monitoring. Use the resource names for the notification channels, which you can retrieve by listing the notification channels in your project. For example, "projects/my-project/notificationChannels/1355376463305411567", "projects/my-project/notificationChannels/1355376463305411568".

  • TRAINING_DATASET is the link to the training dataset stored in Cloud Storage.

    • To use a link to a BigQuery training dataset, replace the the gcsSource field with the following:
    "bigquerySource": {
        {
          "inputUri": "TRAINING_DATASET"
        }
     }
    
    • To use a link to an AutoML model, replace the gcsSource field with the following:
    "dataset": "TRAINING_DATASET"
  • FEATURE_1:VALUE_1 and FEATURE_2:VALUE_2 is the alerting threshold for each feature you want to monitor. For example, if you specify Age=0.4, Model Monitoring logs an alert when the statistical distance between the input and baseline distributions for the Age feature exceeds 0.4. By default, every categorical and numerical feature is monitored with threshold values of 0.3.

For more information about Model Monitoring configurations, see the Monitoring job reference.

Python

See the example notebook to run a batch prediction job with Model Monitoring for a custom tabular model.

Model Monitoring automatically notifies you of job updates and alerts through email.

Access skew metrics

You can use the following methods to access skew metrics for batch prediction jobs:

Console (Histogram)

Use the Google Cloud console to view the feature distribution histograms for each monitored feature and learn which changes led to skew over time:

  1. Go to the Batch predictions page:

    Go to Batch predictions

  2. On the Batch predictions page, click the batch prediction job you want to analyze.

  3. Click the Model Monitoring Alerts tab to view a list of the model's input features, along with pertinent information, such as the alert threshold for each feature.

  4. To analyze a feature, click the name of the feature. A page shows the feature distribution histograms for that feature.

    Visualizing data distribution as histograms lets you quickly focus on the changes that occurred in the data. Afterward, you might decide to adjust your feature generation pipeline or retrain the model.

    Histograms showing example input data distribution and training
          data distribution for skew detection.

Console (JSON file)

Use the Google Cloud console to access the metrics in JSON format:

  1. Go to the Batch predictions page:

    Go to Batch predictions

  2. Click on the name of the batch prediction monitoring job.

  3. Click the Monitoring properties tab.

  4. Click the Monitoring output directory link, which directs you to a Cloud Storage bucket.

  5. Click on the metrics/ folder.

  6. Click on the skew/ folder.

  7. Click on the feature_skew.json file, which directs you to the Object details page.

  8. Open the JSON file using either option:

  • Click Download and open the file in your local text editor.

  • Use the gsutil URI path to run gcloud storage cat gsutil_URI in the Cloud Shell or your local terminal.

The feature_skew.json file includes a dictionary where the key is the feature name and the value is the feature skew. For example:

{
  "cnt_ad_reward": 0.670936,
  "cnt_challenge_a_friend": 0.737924,
  "cnt_completed_5_levels": 0.549467,
  "month": 0.293332,
  "operating_system": 0.05758,
  "user_pseudo_id": 0.1
}

Python

See the example notebook to access skew metrics for a custom tabular model after running a batch prediction job with Model Monitoring.

Debug batch prediction monitoring failures

If your batch prediction monitoring job fails, you can find debugging logs in the Google Cloud console:

  1. Go to the Batch predictions page.

    Go to Batch predictions

  2. Click the name of the failed batch prediction monitoring job.

  3. Click the Monitoring properties tab.

  4. Click the Monitoring output directory link, which directs you to a Cloud Storage bucket.

  5. Click the logs/ folder.

  6. Click either of the .INFO files, which directs you to the Object details page.

  7. Open the logs file using either option:

    • Click Download and open the file in your local text editor.

    • Use the gsutil URI path to run gcloud storage cat gsutil_URI in the Cloud Shell or your local terminal.

Notebook tutorials

Learn more about how to use Vertex AI Model Monitoring to get visualizations and statistics for models with these end-to-end tutorials.

AutoML

Custom

XGBoost models

Vertex Explainable AI feature attributions

Batch prediction

Setup for tabular models

What's next