About resize requests in a MIG


This document describes how resize requests in a managed instance group (MIG) work and their limitations. Use resize requests to create virtual machine (VM) instances with GPUs all at once in a MIG.

Creating VMs all at once in a MIG through a resize request is useful in the following scenarios:

  • When you want GPU VMs for a specific time only, a resize request increases the chances of obtaining GPUs, which are highly-demanded resources.

  • When you want an exact number of VMs to run a job, a resize request helps you to create VMs all at once. By using resize requests, you can also avoid unnecessary charges for the partial capacity that Compute Engine creates while you wait for all the resources to be available.

How resize requests work

When creating a resize request, you must specify the following:

  • resizeBy: the number of VMs that you want to create all at once as part of the request.

  • requestedRunDuration: the duration for which the VMs created as part of the request must run. The run duration must be between 10 minutes and 7 days. At the end of the run duration, the MIG deletes the created VMs.

When you create a resize request in a MIG, Compute Engine sets the state of the request to CREATING, and then transitions it to ACCEPTED when the request is created. Dynamic Workload Scheduler (DWS), the underlying scheduler mechanism, schedules resize requests created across Compute Engine based on requested durations and resource availability.

After DWS schedules the creation of the requested number of VMs, the MIG increases its target size by the number of requested VMs and creates managed instances that are in a CREATING status. These managed instances represent the VMs that the MIG will create when the resize request succeeds. You can't delete the managed instances that are in CREATING status unless you cancel the resize request.

If you lack quota for the requested resources or the resources are temporarily unavailable, the DWS persists the request until you have sufficient quota and the resources become available.

An accepted resize request remains as such until Compute Engine sets its state to one of the following:

  • SUCCEEDED: the MIG created the requested number of VMs all at once. The VMs run until the MIG deletes them after the specified run duration ends, or until you delete the VMs.

  • FAILED: the resize request failed due to a technical error and Compute Engine decreased the target size of the MIG by the number of requested VMs.

  • CANCELLED: a user canceled the resize request and Compute Engine decreased the target size of the MIG by the number of requested VMs. If you want to stop an accepted resize request from creating VMs, you must cancel the resize request, and then you can optionally delete it. If you don't delete a canceled resize request, Compute Engine automatically deletes it 14 days after it's canceled.

To check the state of an accepted resize request or troubleshoot it, view the details of the resize request.

If you delete a MIG containing resize requests, this operation also deletes any resize requests and VMs in the MIG. However, if you delete a MIG when the MIG is creating VMs to fulfill a resize request, Compute Engine waits until the MIG has finished creating the requested number of VMs and the state of the resize request transitions to SUCCEEDED before deleting the MIG.

Limitations

The following sections outline the limitations for creating resize requests in a MIG.

Limitations for resize requests

For resize requests, the following limitations apply:

  • You can use resize request to obtain GPU VMs only.

  • You can create resize request only in zonal MIGs.

  • You can only cancel accepted (ACCEPTED) resize requests.

  • You can only delete a resize request after it succeeds (SUCCEEDED), fails (FAILED), or a user cancels it (CANCELLED).

Limitations for the instance template

For the instance template used in the MIG in which you want to create resize requests, the following limitations apply:

Limitations for the MIG

For the MIG in which you want to create resize requests, the following limitations apply:

Quota for GPU VMs with requested run duration

GPU VMs that are configured to be automatically deleted after a predefined run time of 7 days or less can consume either preemptible or standard allocation quotas. This behavior is intended to help you improve the obtainability of allocation quota for temporary-but-uninterrupted workloads. For more information about this behavior, see GPU VMs and preemptible allocation quotas.

Pricing

There are no costs associated with creating, canceling, or deleting resize requests. You only incur charges for the VMs created through a resize request—from the moment when the MIG creates the VMs, until the MIG automatically deletes the VMs at the end of their run duration or you manually delete the VMs.

If a MIG creates only some of the requested VMs and fails to create the remaining ones, then you may still incur charges for the created VMs until the MIG automatically deletes them.

What's next