DocumentAI – Internal Server Error when training on pre-labeled documents

We want to automate training of our custom extraction processor by sending PDFs to the OCR processor, adding entities to the document for the values that we know of, uploading it to GCS as .json, and training the processor on our document. 

On our initial tries we get the following error:

 

 

{
  "name": "projects/818666290880/locations/us/operations/8089109272130466059",
  "done": true,
  "result": "error",
  "response": {},
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.documentai.uiv1beta3.ImportDocumentsMetadata",
    "commonMetadata": {
      "state": "FAILED",
      "createTime": "2023-05-04T21:36:28.268708Z",
      "updateTime": "2023-05-04T21:36:29.172396Z",
      "resource": "projects/818666290880/locations/us/processors/a57d00c6e1c2727/dataset"
    },
    "individualImportStatuses": [
      {
        "inputGcsSource": "gs://mybucket.appspot.com/path/to/00d45d9f-c6e3-4937-aa5c-59f395cfb5f0.json",
        "status": {
          "code": 13,
          "message": "Internal error encountered."
        }
      }
    ],
    "totalDocumentCount": 1
  },
  "error": {
    "code": 13,
    "message": "Internal error encountered.",
    "details": []
  }
}

 

There is no log in Cloud Logging related to Document AI. The operation is 

projects/818666290880/locations/us/operations/8089109272130466059.
 
Is there a way for us to determine what it causing the error? Thank you
0 2 546
2 REPLIES 2

Good day @ldiqual ,

Welcome to Google Cloud Community!

There are several reasons why you are encountering this error, but here are some few solutions that you can validate if it will solve the problem. 

1. Please check if the bounding boxes are overlapping or intersecting. You can also try relabeling and see if it solves the problem. Please check this documentation for more information regarding labeling and json file: 
https://cloud.google.com/document-ai/docs/workbench/label-documents#manual-label

2. Please note about the best practices of labeling and the minimum requirements for custom processors, it is required to have at least 10 documents for training and test sets with 10  instances of each label per set. But the recommendation is to have 50 documents for training and test sets with 50 instances of each label per set. You can learn more here: https://cloud.google.com/document-ai/docs/workbench/build-custom-processor#import_pre-labeled_data_t...

2. Kindly verify if the service account has the right permissions to perform certain actions, this issue might be related to the authentication, GCS permissions or bucket permissions. You can learn more here:
https://cloud.google.com/document-ai/docs/workbench/create-dataset
https://cloud.google.com/document-ai/docs/access-control/iam-permissions
https://cloud.google.com/storage/docs/access-control
https://cloud.google.com/storage/docs/access-control/iam-roles

3. Make sure that the Document API and Cloud Storage API are enabled. 

4. Kindly check if it is a supported language and in a supported region using this link: https://cloud.google.com/document-ai/docs/processors-list#processor_cde

You can also reach out to Google Cloud Support. Here is the link: https://cloud.google.com/support

same issue, you could solve it?