Gemini ai Flash model for ocr

I'm using the Gemini Flash model for OCR extraction of document image headers and line items. While specifying the structure for extracting line item descriptions, I'm encountering issues with descriptions that span two lines. For instance, when I provide the second line of a description like "CLUTCH DAMPER DOUBLE WHEEL" as input, the model outputs the entire content or the first line only, seemingly based on word length.

# Model:
vision_model = ChatVertexAI(model_name="gemini-1.5-flash-001", project= PROJECT_ID, location= "asia-south1", max_output_tokens= 8192)

# Invoking the model for response:
response = vision_model.invoke(
[
SystemMessage(content="Examine the image provided to extract comprehensive details from each line item and generate a well-structured JSON output having only the corresponding indices for the extracted data."),
HumanMessage(
content=[
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"}},
]
)
],
)
Is there a solution or adjustment in how I should prompt the model to accurately extract needed lines in such two-line descriptions?

My prompt for the model includes the instruction:

prompt = '''
Extract the line item description based on the user-provided example: {example_set['Description']}, using rules that consider content from the first or second line depending on the example provided. Adjust the description's length by expanding or trimming it to match the formatting in the examples.
'''
- >But I got a complete description: "DOUBLE VOLANT AMORTISSEUR EMBR CLUTCH DAMPER DOUBLE WHEEL"

prompt = '''
"DES": "Provide the description content of the line item. The extracted description may be a long sentence or a long context, but based on the user-provided example: {example_set["DES"]}, the extracted description result needs to be trimmed. Additionally, if there are different languages with the same contextual meaning, then extract the English context only."
'''

->Sometimes it's referred to as "DOUBLE VOLANT AMORTISSEUR EMBR," and other times as "DOUBLE VOLANT AMORTISSEUR EMBR CLUTCH DAMPER DOUBLE WHEEL." The required terminology will depend on the input, such as "CLUTCH DAMPER DOUBLE WHEEL" or "DOUBLE WHEEL."

Required:
Need a prompt based on user input analysis to extract the description accurately.

 

0 0 72