Read shapefile from Azure Blob Storage in Azure Machine Learning

Question

I have several shapefiles stored in Azure Blob Storage within my Azure Machine Learning workspace, each comprising the files: file.fix, file.shp, file.dbf, file.prj, and file.shx. I need to directly access and read these shapefiles within my Azure Machine Learning environment.

So far, I've successfully read Parquet files using the following code:

Dataset.Tabular.from_parquet_files(path=[(datastore, file_path)]).to_pandas_dataframe()

and CSV files using:

table = Dataset.Tabular.from_delimited_files(path=[(datastore, file_path)]).to_pandas_dataframe()

While I did come across a solution for reading shapefiles in Azure Databricks, I haven't found a direct method for accomplishing this within Azure Machine Learning.

I understand that one workaround could be to download the files and read them locally within the code. However, I'm unsure about the implementation details for this approach.

Any help would be greatly appreciated.

JayashankarGS · Accepted Answer · 2024-02-29 07:15:12Z

You can follow the approach below.

Code to download all the required files:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import os

def download_blob_folder(blob_service_client, container_name, folder_name, local_directory):
    container_client = blob_service_client.get_container_client(container_name)
    blobs_list = container_client.list_blobs(name_starts_with=folder_name)

    for blob in blobs_list:
        blob_client = container_client.get_blob_client(blob)
        blob_relative_path = blob.name[len(folder_name)+1:]
        local_file_path = os.path.join(local_directory, blob_relative_path)

        os.makedirs(os.path.dirname(local_file_path), exist_ok=True)

        with open(local_file_path, "wb") as file:
            download_stream = blob_client.download_blob()
            file.write(download_stream.read())
        print(blob_relative_path,local_directory)
        print(f"Blob '{blob.name}' downloaded to '{local_file_path}'")


connection_string = "DefaultEndpointsProtocol=https;AccountName=xxxx;AccountKey=xxxxxxxxxxxxxxxsaaaaaaaaaaaaaaaaaaadddddddcore.windows.net"
container_name = "sample"
folder_name = "map"


blob_service_client = BlobServiceClient.from_connection_string(connection_string)


local_directory = "./downloaded_files/"

download_blob_folder(blob_service_client, container_name, folder_name, local_directory)

Here, I am downloading all the files required to read a shapefile from the folder map. Make sure you have only files required to read in that folder because the above code downloads all the files present in the given folder.

After downloading, install geopandas through pip:

pip install geopandas

Code to read:

import geopandas as gpd

file_name="./downloaded_files/gadm41_IND_0.shp"
data = gpd.read_file(file_name)
data

Output:

enter image description here

Collectives™ on Stack Overflow

Read shapefile from Azure Blob Storage in Azure Machine Learning

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
shapefile
azure-machine-learning-service
azureml-python-sdk
azuremlsdk
azure-ml-component
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged shapefileazure-machine-learning-serviceazureml-python-sdkazuremlsdkazure-ml-component or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
shapefile
azure-machine-learning-service
azureml-python-sdk
azuremlsdk
azure-ml-component
or ask your own question.