0

I'm trying to create a Dataset from a datastore using Azure ML, however, the execution hangs forever and never finishes.

This is the code I'm running which I've adapted from the Msft documentation:

import azureml.core
from azureml.core import Workspace, Datastore, Dataset

ws = Workspace.from_config()

datastore = Datastore.get(ws, datastore_name='blobs')
data_path=[(datastore,"contacts.csv")]
Dataset.File.from_files(path=data_path) # <-- This method never finishes

Here we can see that the command never completes:

enter image description here

There is only 1 test file contacts.csv in the storage. The storage is a Blob container, but I've tested with a data lake (DSL) container and got the same issue. It looks like a similar problem shared in this other question.

I must add that outbound rules are configured to use private endpoints.

As part of my troubleshooting steps, I've confirmed that network connectivity to the storage looks OK - not only by testing via SSH inside the Azure ML instance and it resolves to a private IP, but also using other SDKs such as with the Datastore.download() method.

Here I show how using a download approach I can reach the file from the same datastore. This tells me that network and authentication are properly configured, and something is wrong with my Dataset code? Same infrastructure, just changed the code a bit.

import os
import azureml.core
from azureml.core import Workspace, Datastore, Dataset

ws = Workspace.from_config()

datastore = Datastore.get(ws, datastore_name='blobs')
datastore.download(target_path="./output", prefix="contacts.csv", overwrite=False)

arr = os.listdir('./output')
print(arr)

file = open("./output/contacts.csv", "r").read()
print(file)

enter image description here

3
  • Have you tried using sdk v2? Commented Mar 18 at 11:13
  • @JayashankarGS I might as well try that and report how it goes Commented Mar 18 at 13:41
  • @JayashankarGS I think the issue is related to other dependencies that ML requires, such as .NET SDK. It fails to get them because my workspace is using an isolated managed VNET without outbound access. Commented Mar 18 at 16:21

1 Answer 1

0

An outbound restricted AML architecture will not allow for outbound connections that are not approved.

AML will try to dynamically download dependencies and that will fail without adding an FQDN outbound rule.

Not the answer you're looking for? Browse other questions tagged or ask your own question.