Azure Machine Learning dataset creation hangs forever

Question

I'm trying to create a Dataset from a datastore using Azure ML, however, the execution hangs forever and never finishes.

This is the code I'm running which I've adapted from the Msft documentation:

import azureml.core
from azureml.core import Workspace, Datastore, Dataset

ws = Workspace.from_config()

datastore = Datastore.get(ws, datastore_name='blobs')
data_path=[(datastore,"contacts.csv")]
Dataset.File.from_files(path=data_path) # <-- This method never finishes

Here we can see that the command never completes:

There is only 1 test file contacts.csv in the storage. The storage is a Blob container, but I've tested with a data lake (DSL) container and got the same issue. It looks like a similar problem shared in this other question.

I must add that outbound rules are configured to use private endpoints.

As part of my troubleshooting steps, I've confirmed that network connectivity to the storage looks OK - not only by testing via SSH inside the Azure ML instance and it resolves to a private IP, but also using other SDKs such as with the Datastore.download() method.

Here I show how using a download approach I can reach the file from the same datastore. This tells me that network and authentication are properly configured, and something is wrong with my Dataset code? Same infrastructure, just changed the code a bit.

import os
import azureml.core
from azureml.core import Workspace, Datastore, Dataset

ws = Workspace.from_config()

datastore = Datastore.get(ws, datastore_name='blobs')
datastore.download(target_path="./output", prefix="contacts.csv", overwrite=False)

arr = os.listdir('./output')
print(arr)

file = open("./output/contacts.csv", "r").read()
print(file)

@JayashankarGS I might as well try that and report how it goes — Evandro Pomatti, Commented Mar 18 at 13:41
@JayashankarGS I think the issue is related to other dependencies that ML requires, such as .NET SDK. It fails to get them because my workspace is using an isolated managed VNET without outbound access. — Evandro Pomatti, Commented Mar 18 at 16:21

Evandro Pomatti · Accepted Answer · 2024-03-18 17:05:25Z

0

An outbound restricted AML architecture will not allow for outbound connections that are not approved.

AML will try to dynamically download dependencies and that will fail without adding an FQDN outbound rule.

answered Mar 18 at 17:05

Evandro Pomatti

14.6k19 gold badges113 silver badges174 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Azure Machine Learning dataset creation hangs forever

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
azure
azure-machine-learning-service
azureml-python-sdk
azuremlsdk
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonazureazure-machine-learning-serviceazureml-python-sdkazuremlsdk or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
azure
azure-machine-learning-service
azureml-python-sdk
azuremlsdk
or ask your own question.