Hello.
I am trying to improve "Invoice Parser" processor to know some additional labels. Major problem, that it did not see some numbers. I tested "OCR processor" and it extracted this numbers without issue, but on training screen when I select this numbers, like on screenshot, I get nothing in value. Even if I am correct value, it will not allow me to train model, because it will say this selected labels is empty in documents. How I can fix this issue? I have multiple documents and all have such issues. It does not skip all numbers, but like skip some of them and ignore them (especially "0" in this table).
Solved! Go to Solution.
You're right @oleks_vasyliev ,
As I replicate it in my end the custom extractor has better recognition with number such as 0.
For the pricing you may refer to this Document AI Pricing for more information.
Hope this helps.
Hi oleks_vasyliev,
There are several reasons why Document AI might be struggling to recognize zeros in your training data. Here's a consolidated view of the potential causes and solutions:
Labeling Issues:
Label documents - are required to train, up-train, or evaluate a processor version.
Data Quality and Preprocessing:
Data considerations and recommendations - The quality and the amount of your data determines the quality of the training, uptraining, and evaluation.
In addition, you may refer to the below items:
By addressing these factors and implementing the appropriate solutions, you should be able to improve Document AI's ability to recognize zeros in your training data and successfully train your model.
I hope this helps.
Here video with example:
Here for you example by trying to use both tools
Bounding boxes: https://youtube.com/shorts/kTBfqhKMT4A?feature=share
Text annotations (it is even visible, that this numbers have no grey background boxes and cannot be selected): https://youtube.com/shorts/CCm7uCVpSnA?feature=share
I hope videos will explain what is the issue.
I try to use a different parser, Expense Parser to be specific. here's the result in the tab of Evaluate and Test:
Hope this helps.
Thanks for reply. Sorry, but already spend weeks to train "invoice parser" for new labels.
Do you want me to say, that "Expense Parser" have better training ability to recognize numbers, than "Invoice parser"? Why train playground different for this parsers, if both will be used by human, not robot? Also, does this mean even if I need parse bills and invoice parser is logical, I still need to use "Expense Parser" because somehow it better? Just need to know which is best one to select in future.
No worries oleks_vasyliev, I am here to help you out with this matter. I understand that switching parser may take you a lot of effort and time. Here are some documentation regarding invoice parser that might help you out to understand why zeros are not recognized by the processor. As well as the limits of Document AI depending on the processor you are using.
If none of these suggestions resolve the issue, consider reaching out to Google Cloud support for further assistance. Thank you.
Thanks.
Last question @McMaco . If we switch to "Custom Extractor", is it can handle such cases better (with this numbers), than invoice/expense parsers (but will cost us 3 times more for such functionality)? Thanks
You're right @oleks_vasyliev ,
As I replicate it in my end the custom extractor has better recognition with number such as 0.
For the pricing you may refer to this Document AI Pricing for more information.
Hope this helps.
Thanks. Bad, that even in your example it miss zero at second row (MAR/23) 😞
User | Count |
---|---|
1 | |
1 | |
1 | |
1 | |
1 |