0

I want to build a telegram chat bot with llama 3. I'm looking for various ways to do this. Somebody suggested me to use VertexAI on Google, but I can't understand the prices when it comes to llama 3. I only found information on prices for Gemini models deployed on VertexAI. Another approach would be to create a Google Cloud VM and build a docker with llama there, but also in this case I'm not familiar with the prices. Can you help me to understand which would be the best approach?

I've found some information from Google VertexAI documentation about their prices but they were very obscure

1
  • You can find sample in Vertex Model Garden to do so.
    – gogasca
    Commented Jul 9 at 18:03

1 Answer 1

0

Vertex AI Model Garden is the best option to find and deploy Llama3. Use this link to get to Llama model garden. It is model repository where you can directly deploy (and run inference) Llama3 with 2 options:

  • Vertex AI: fully managed platform with auto-configured endpoint
  • GKE (kuberentes)

Both options require your GCP project to have sufficient accelerator (TPU or GPU) quota. For example, Vertex AI deployment for Llama3-8B-chat-001 can be deployed to TPU ct5lp-highcpu-4t or GPU g2-standard-12 machines.

Deploy to GKE requires you to provision the GKE autopilot cluster, but Model garden provides most of the GKE configuration (manifest files) to deploy the model and allocate TPU/GPUs.

Vertex AI approach will be much easier from infrastructure aspect, plus accelerator optimization. You can provision a GCE VM and install Docker, but more work involved. Primarily on configuring the accelerators like TPU/GPU on bare VMs.

Reference the custom-trained models for pricing.

Not the answer you're looking for? Browse other questions tagged or ask your own question.