I'm trying to create a Dataset
from a datastore using Azure ML, however, the execution hangs forever and never finishes.
This is the code I'm running which I've adapted from the Msft documentation:
import azureml.core
from azureml.core import Workspace, Datastore, Dataset
ws = Workspace.from_config()
datastore = Datastore.get(ws, datastore_name='blobs')
data_path=[(datastore,"contacts.csv")]
Dataset.File.from_files(path=data_path) # <-- This method never finishes
Here we can see that the command never completes:
There is only 1 test file contacts.csv
in the storage. The storage is a Blob container, but I've tested with a data lake (DSL) container and got the same issue. It looks like a similar problem shared in this other question.
I must add that outbound rules are configured to use private endpoints.
As part of my troubleshooting steps, I've confirmed that network connectivity to the storage looks OK - not only by testing via SSH inside the Azure ML instance and it resolves to a private IP, but also using other SDKs such as with the Datastore.download()
method.
Here I show how using a download
approach I can reach the file from the same datastore. This tells me that network and authentication are properly configured, and something is wrong with my Dataset
code? Same infrastructure, just changed the code a bit.
import os
import azureml.core
from azureml.core import Workspace, Datastore, Dataset
ws = Workspace.from_config()
datastore = Datastore.get(ws, datastore_name='blobs')
datastore.download(target_path="./output", prefix="contacts.csv", overwrite=False)
arr = os.listdir('./output')
print(arr)
file = open("./output/contacts.csv", "r").read()
print(file)