1

I have several shapefiles stored in Azure Blob Storage within my Azure Machine Learning workspace, each comprising the files: file.fix, file.shp, file.dbf, file.prj, and file.shx. I need to directly access and read these shapefiles within my Azure Machine Learning environment.

So far, I've successfully read Parquet files using the following code:

Dataset.Tabular.from_parquet_files(path=[(datastore, file_path)]).to_pandas_dataframe()

and CSV files using:

table = Dataset.Tabular.from_delimited_files(path=[(datastore, file_path)]).to_pandas_dataframe()

While I did come across a solution for reading shapefiles in Azure Databricks, I haven't found a direct method for accomplishing this within Azure Machine Learning.

I understand that one workaround could be to download the files and read them locally within the code. However, I'm unsure about the implementation details for this approach.

Any help would be greatly appreciated.

1 Answer 1

1

You can follow the approach below.

Code to download all the required files:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import os

def download_blob_folder(blob_service_client, container_name, folder_name, local_directory):
    container_client = blob_service_client.get_container_client(container_name)
    blobs_list = container_client.list_blobs(name_starts_with=folder_name)

    for blob in blobs_list:
        blob_client = container_client.get_blob_client(blob)
        blob_relative_path = blob.name[len(folder_name)+1:]
        local_file_path = os.path.join(local_directory, blob_relative_path)

        os.makedirs(os.path.dirname(local_file_path), exist_ok=True)

        with open(local_file_path, "wb") as file:
            download_stream = blob_client.download_blob()
            file.write(download_stream.read())
        print(blob_relative_path,local_directory)
        print(f"Blob '{blob.name}' downloaded to '{local_file_path}'")


connection_string = "DefaultEndpointsProtocol=https;AccountName=xxxx;AccountKey=xxxxxxxxxxxxxxxsaaaaaaaaaaaaaaaaaaadddddddcore.windows.net"
container_name = "sample"
folder_name = "map"


blob_service_client = BlobServiceClient.from_connection_string(connection_string)


local_directory = "./downloaded_files/"

download_blob_folder(blob_service_client, container_name, folder_name, local_directory)

Here, I am downloading all the files required to read a shapefile from the folder map. Make sure you have only files required to read in that folder because the above code downloads all the files present in the given folder.

After downloading, install geopandas through pip:

pip install geopandas

Code to read:

import geopandas as gpd

file_name="./downloaded_files/gadm41_IND_0.shp"
data = gpd.read_file(file_name)
data

Output:

enter image description here

Not the answer you're looking for? Browse other questions tagged or ask your own question.