Deploy llama on VertexAI [closed]

Question

Closed. This question is not about programming or software development. It is not currently accepting answers.

This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.

Closed 15 days ago.

Improve this question

I want to build a telegram chat bot with llama 3. I'm looking for various ways to do this. Somebody suggested me to use VertexAI on Google, but I can't understand the prices when it comes to llama 3. I only found information on prices for Gemini models deployed on VertexAI. Another approach would be to create a Google Cloud VM and build a docker with llama there, but also in this case I'm not familiar with the prices. Can you help me to understand which would be the best approach?

I've found some information from Google VertexAI documentation about their prices but they were very obscure

You can find sample in Vertex Model Garden to do so.
– gogasca
Commented Jul 9 at 18:03 — gogasca, Commented Jul 9 at 18:03

Gang Chen · Accepted Answer · 2024-07-11 14:00:24Z

Vertex AI Model Garden is the best option to find and deploy Llama3. Use this link to get to Llama model garden. It is model repository where you can directly deploy (and run inference) Llama3 with 2 options:

Vertex AI: fully managed platform with auto-configured endpoint
GKE (kuberentes)

Both options require your GCP project to have sufficient accelerator (TPU or GPU) quota. For example, Vertex AI deployment for Llama3-8B-chat-001 can be deployed to TPU ct5lp-highcpu-4t or GPU g2-standard-12 machines.

Deploy to GKE requires you to provision the GKE autopilot cluster, but Model garden provides most of the GKE configuration (manifest files) to deploy the model and allocate TPU/GPUs.

Vertex AI approach will be much easier from infrastructure aspect, plus accelerator optimization. You can provision a GCE VM and install Docker, but more work involved. Primarily on configuring the accelerators like TPU/GPU on bare VMs.

Reference the custom-trained models for pricing.

Collectives™ on Stack Overflow

Deploy llama on VertexAI [closed]

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
google-cloud-platform
telegram-bot
google-cloud-vertex-ai
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged google-cloud-platformtelegram-botgoogle-cloud-vertex-ai or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
google-cloud-platform
telegram-bot
google-cloud-vertex-ai
or ask your own question.