I have a storage account with Azure Container Storage configured consisting of multiple pdf/word/excel files. I would like to use Azure Document Intelligence to semantically chunk these files.
Is there a possibility to load the files directly from Container Storage to Azure Document Intelligence using langchain
? According to the langchain
docs it seems like either file has to be locally available or public url has to be handed over.
Attempt:
# Prerequisite: An Azure AI Document Intelligence resource in one of the 3 preview regions: East US, West US2, West Europe
import os
from langchain_community.document_loaders import AzureAIDocumentIntelligenceLoader
file_path = "storage-path-to-file"
endpoint = os.getenv("DOCUMENTINTELLIGENCE_ENDPOINT")
key = os.getenv("DOCUMENTINTELLIGENCE_API_KEY")
loader = AzureAIDocumentIntelligenceLoader(
api_endpoint=endpoint, api_key=key, file_path=file_path, api_model="prebuilt-layout"
)
documents = loader.load()
# Returns:
# Message: Invalid request.
# Inner error: {
# "code": "InvalidManagedIdentity",
# "message": "The managed identity configuration is invalid: Managed identity is not enabled # for the current resource."
# }