0

All examples that I have seen of the use of the command_component decorator to create custom components in Azure ML pipelines use an image from Microsoft's container registry, as seen in this Microsoft tutorial page and pasted below.

However, I was wondering if, and how, can we use a custom built image here?

My goal is to use an image with MS-SQL ODBC drivers installed. I tried mentioning a locally built image (on the cpu compute) in the image argument, but it errored out looking for a registry. I also tried using an image I had pushed to Azure Container Registry but that didn't work either as I don't know how to send credentials to this decorator.

# Converts MNIST-formatted files at the passed-in input path to training data output path and test data output path
import os
from pathlib import Path
from mldesigner import command_component, Input, Output


@command_component(
    name="prep_data",
    version="1",
    display_name="Prep Data",
    description="Convert data to CSV file, and split to training and test data",
    environment=dict(
        conda_file=Path(__file__).parent / "conda.yaml",
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    ),
)
def prepare_data_component(
    input_data: Input(type="uri_folder"),
    training_data: Output(type="uri_folder"),
    test_data: Output(type="uri_folder"),
):
    convert(
        os.path.join(input_data, "train-images-idx3-ubyte"),
        os.path.join(input_data, "train-labels-idx1-ubyte"),
        os.path.join(training_data, "mnist_train.csv"),
        60000,
    )
0

1 Answer 1

0

You can pass an environment object with a custom Dockerfile context like below:

from azure.ai.ml.entities import Environment, BuildContext
environment = Environment(build=BuildContext(path="docker_context"))

@command_component(
    name="prep_data",
    version="1",
    display_name="Prep Data",
    description="Convert data to CSV file, and split to training and test data",
    environment=environment,
)
def prepare_data_component(
    input_data: Input(type="uri_folder"),
    training_data: Output(type="uri_folder"),
    test_data: Output(type="uri_folder"),
):
    convert(
        os.path.join(input_data, "train-images-idx3-ubyte"),
        os.path.join(input_data, "train-labels-idx1-ubyte"),
        os.path.join(training_data, "mnist_train.csv"),
        60000,
    )

def convert(imgf, labelf, outf, n):
    f = open(imgf, "rb")
    l = open(labelf, "rb")
    o = open(outf, "w")

    f.read(16)
    l.read(8)
    images = []

    for i in range(n):
        image = [ord(l.read(1))]
        for j in range(28 * 28):
            image.append(ord(f.read(1)))
        images.append(image)

    for image in images:
        o.write(",".join(str(pix) for pix in image) + "\n")
    f.close()
    o.close()
    l.close()

Here, in environment = Environment(build=BuildContext(path="docker_context")), the docker_context is the folder containing the Dockerfile and dependencies like conda_dependencies.yaml and requirements.txt.

In the Dockerfile, add your custom Docker commands to install MS-SQL ODBC.

For more information about building the custom image, refer to this Stack Overflow solution and the command_component class for more information on environment arguments.

Not the answer you're looking for? Browse other questions tagged or ask your own question.