Solved: streamGenerateContent Method of Gemini Rest APIs g...

jsnc · 02-20-2024 09:50 PM

Based on api documentation of gemini api (streamGenerateContent method), response should be a single json object which has 'candidates' field like below.

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": string
          }
        ]
      },
      "finishReason": enum (FinishReason),
      "safetyRatings": [
        {
          "category": enum (HarmCategory),
          "probability": enum (HarmProbability),
          "blocked": boolean
        }
      ],
      "citationMetadata": {
        "citations": [
          {
            "startIndex": integer,
            "endIndex": integer,
            "uri": string,
            "title": string,
            "license": string,
            "publicationDate": {
              "year": integer,
              "month": integer,
              "day": integer
            }
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "promptTokenCount": integer,
    "candidatesTokenCount": integer,
    "totalTokenCount": integer
  }
}

But the actual response is json array with multiple json objects, each json object containing just a part of the generated text. We have to combine all the parts to get the whole generated text.

Is there any alternative for this? and why is it giving json array for a single call? is there any mechanism to do multiple calls like pagination?

https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini?_ga=2.124180611.-849081...

nlarry

Hi,

In my previous post, I have the link to generateContent REST API.

For long response, you still need to use streaming. Below is the example from the public documentation.

!curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:streamGenerateContent?key=${API_KEY} \
        -H 'Content-Type: application/json' \
        --no-buffer \
        -d '{ "contents":[{"parts":[{"text": "Write long a story about a magic backpack."}]}]}' \
        2> /dev/null | grep "text"

Response

"text": "In the bustling city of Meadow brook, lived a young girl named Sophia with an"
            "text": " insatiable curiosity that knew no bounds. One sunny morning, as she walked to school, she came across a charming little shop tucked away on a side street. Int"
            "text": "rigued, she stepped inside and was immediately captivated by the mystical aura that enveloped the room. Amidst the shelves lined with ancient books and curious trinkets, she stumbled upon a peculiar backpack. Crafted from deep emerald leather and adorned with shimmering crystals, it seemed to pulse with a hidden energy. Unable to resist its allure"
            "text": ", Sophia reached out and touched the backpack, and in that moment, a powerful connection was forged. As if awakening from a long slumber, the backpack emitted a soft glow, revealing its extraordinary properties.\n\nSophia discovered that this was no ordinary backpack. It possessed the magical ability to grant her wishes, but only if they were made with a pure heart and genuine intentions. Overwhelmed with excitement and responsibility, Sophia pondered deeply about how she would use this extraordinary gift. She decided that her first wish would be to make her grandmother, who lived far away, feel close to her. With a heartfelt desire, she whispered her wish to the backpack"

View solution in original post

nimrah-waqar

The response structure is not consistent with the API documentation, and you'll need to combine the parts of the generated text locally. Check the documentation for pagination mechanisms or contact the API provider for assistance.

jsnc

I couldn't find any pagination mechanisms in the documentation for the streamGenerateContent.

We also have generateContent method in MakerSuite (Google AI studio) which gives single object. (ref: https://ai.google.dev/api/rest/v1/models/generateContent)
If there's no pagination for streamGenerateContent, and response is changing - what is the use of streamGenerateContent over generateContent for people who are consuming this REST api?

nimrah-waqar

The streamGenerateContent method's response structure doesn't match the documentation, requiring developers to combine parts of the generated text locally. Without pagination mechanisms, it's less clear why streamGenerateContent is preferred over generateContent, which provides a single object response. Clarification from the API provider may be necessary to understand the advantages of using streamGenerateContent.

nlarry

Do you have the requirement to use streamGenerateContent instead of just generateContent? Stream version will return a stream of responses. Below is the example code

async function streamGenerateContent() {
  const request = {
    contents: [{role: 'user', parts: [{text: 'How are you doing today?'}]}],
  };
  const streamingResp = await generativeModel.generateContentStream(request);
  for await (const item of streamingResp.stream) {
    console.log('stream chunk: ', JSON.stringify(item));
  }
  console.log('aggregated response: ', JSON.stringify(await streamingResp.response));
};

streamGenerateContent();

Below is the example for generateContent

async function generateContent() {
  const request = {
    contents: [{role: 'user', parts: [{text: 'How are you doing today?'}]}],
  };
  const resp = await generativeModel.generateContent(request);

  console.log('aggregated response: ', JSON.stringify(await resp.response));
};

generateContent();

jsnc

Hello,
I understand if we're calling like this in code makes a difference for streamGenerateContent and generateContent. But if we're using REST API - it's not really streaming right? it's just giving response, a single time but with multiple response objects. There's no difference like you've shown above if we're using REST API other than the response where we manually have to combine all the generated text.

Also, there's no documentation for generateContent method in Vertex AI Gemini REST API. So we have to use only streamGenerateContent.

nlarry

Hi,

In my previous post, I have the link to generateContent REST API.

For long response, you still need to use streaming. Below is the example from the public documentation.

!curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:streamGenerateContent?key=${API_KEY} \
        -H 'Content-Type: application/json' \
        --no-buffer \
        -d '{ "contents":[{"parts":[{"text": "Write long a story about a magic backpack."}]}]}' \
        2> /dev/null | grep "text"

Response

"text": "In the bustling city of Meadow brook, lived a young girl named Sophia with an"
            "text": " insatiable curiosity that knew no bounds. One sunny morning, as she walked to school, she came across a charming little shop tucked away on a side street. Int"
            "text": "rigued, she stepped inside and was immediately captivated by the mystical aura that enveloped the room. Amidst the shelves lined with ancient books and curious trinkets, she stumbled upon a peculiar backpack. Crafted from deep emerald leather and adorned with shimmering crystals, it seemed to pulse with a hidden energy. Unable to resist its allure"
            "text": ", Sophia reached out and touched the backpack, and in that moment, a powerful connection was forged. As if awakening from a long slumber, the backpack emitted a soft glow, revealing its extraordinary properties.\n\nSophia discovered that this was no ordinary backpack. It possessed the magical ability to grant her wishes, but only if they were made with a pure heart and genuine intentions. Overwhelmed with excitement and responsibility, Sophia pondered deeply about how she would use this extraordinary gift. She decided that her first wish would be to make her grandmother, who lived far away, feel close to her. With a heartfelt desire, she whispered her wish to the backpack"

vikiai

Google responds with 1 array object in response to streaming, so multiple parts of response cannot be parsed.

Workaround is to parse line by line, trim the starting and ending '[]' brackets for the array, ignore the in between lines with ","

A simpler thing to do would have been for google to respond with each stream object as a valid json, but that has not been done.

lloydzhou

https://github.com/google-gemini/cookbook/blob/main/quickstarts/rest/Streaming_REST.ipynb

streamGenerateContent Method of Gemini Rest APIs giving multiple json objects