Newest 'large-language-model' Questions - Page 4

-1 votes

0 answers

30 views

Embedding and Vector Search with Milvus

I am trying to make a RAG-based chatbot application that lets the user prompt in natural language and receive relevant information that can be retrieved a collection of from multiple tables, all of ...

Calvin Nguyen

1

asked Jul 17 at 4:44

-2 votes

0 answers

32 views

Want to run a Local LLM on Nvidia Jetson AGX Orin over GPU

I am looking to run a local LLM (Large Language Model) on an Nvidia Jetson AGX Orin over the GPU CUDA Cores . Could anyone provide guidance or share resources on how to achieve this? Thank you in ...

Mausam Jain

1

asked Jul 17 at 3:55

0 votes

0 answers

27 views

How to increase maximum limit for completion_tokens in AWS Sagemaker invoke endpoint

I have deployed the meta-llama/Meta-Llama-3-8B-Instruct model using HuggingFaceModel. The model responds with the full output when I make a call using HuggingFaceModel's predictor method. Here is the ...

keerti4p

1

asked Jul 16 at 9:29

0 votes

0 answers

46 views

How to improve response time of Phi-3-medium-128k serverless API?

I have deployed the Phi-3-medium-128k model using Azure AI Studio (serverless deployment). I am using the v1/chat/completions API to get chat completions and I am streaming the response. The time to ...

Rithika Chowta

1

asked Jul 16 at 7:53

0 votes

1 answer

39 views

How to get multimodal embeddings from CLIP model?

I'm hoping to use CLIP to get a single embedding for rows of multimodal (image and text) data. Say I have the following model: from PIL import Image import torch from transformers import CLIPProcessor,...

T_d

13

asked Jul 15 at 19:53

-1 votes

0 answers

5 views

Is ChatGPT and LLM killing stackoverflow [migrated]

Last few months I have been using ChatGPT LLM for coding, debugging, troubleshooting. Earlier I used to google / post my question on stackoverflow. But now I have instant solution to most of my coding ...

Naresh Chaurasia

437

asked Jul 15 at 18:21

2 votes

1 answer

139 views

Saving Fine-tune Falcon HuggingFace LLM Model

I'm trying to save my model so it won't need to re-download the base model every time I want to use it but nothing seems to work for me, I would love your help with it. The following parameters are ...

Lidor Eliyahu Shelef

1,334

asked Jul 15 at 14:20

0 votes

1 answer

53 views

GPT-2 model from hugging face always generate same result

Why were all the results I got from the GPT-2 model the same no matter what I fed into it? The following are my operating details. First I download the needed files from the official website. These ...

zhangtianpu

1

asked Jul 15 at 6:31

0 votes

0 answers

20 views

OOM Error using PPO Trainer to LoRa-tune 4-bit Llama-3-8B Model (TRL Hugging Face Library)

As per the standard for PPO Training (which is to do supervised-fine tuning before running the PPO Algorithm) I did a QLoRa fine-tuning of the Llama-3-8B instruct model using my own custom data and ...

Aryaman Jaggi

1

asked Jul 15 at 2:45

0 votes

0 answers

90 views

Meta Llama-3 prompt sample

I am trying to ask Llama-3 model to read a document and then answer my questions, but my code seems does not generate any output. Can someone tell me what’s wrong with the code? I appreciate it. Code: ...

Joey1205

1

asked Jul 15 at 1:36

0 votes

0 answers

17 views

Chunking a Tokenized dataset

I am trying to experiment with the databricks-dolly-15k dataset to make it suitable for fine tuning a Llama2 model according to this article by Phil Schmid. The initial part of building the dataset is ...

Arindam

312

asked Jul 14 at 11:44

0 votes

0 answers

24 views

Export a teknium/OpenHermes-2.5-Mistral-7B model to ONNX

I am trying to export teknium/OpenHermes-2.5-Mistral-7B to ONNX, This is my code: import torch from transformers import AutoModelForCausalLM, AutoTokenizer import onnx model_name = "teknium/...

mohammed yazid Berrached

1

asked Jul 14 at 10:38

0 votes

0 answers

30 views

Google Canary, Docker and Gemini Nano

Can I run the latest Google Canary with the Gemini Nano model in a Docker container in headless mode and interact with the model via Selenium (execute_script)? If so, how do I do it?

aasdf1xa

13

asked Jul 14 at 8:00

-4 votes

0 answers

25 views

Want to retrain my LLM based user questions and answers on OpenAI

We have created a solution where users can upload their PDFs and ask questions. We have used NodeJS, Langchain, and OpenAI. Currently, the app flow is we save all the content of PDF in our vector ...

Muhammad Mudassir

313

asked Jul 13 at 13:51

0 votes

0 answers

20 views

apply different learning rate for introduced tokens in the transformers library

Say I want to introduce a few new tokens into the vocabulary of an existing model, and I want these tokens to have a different learning rate compared to the rest of the model's parameters during ...

Bipolo

73

asked Jul 13 at 10:54

Collectives™ on Stack Overflow

Questions tagged [large-language-model]

Embedding and Vector Search with Milvus

Want to run a Local LLM on Nvidia Jetson AGX Orin over GPU

How to increase maximum limit for completion_tokens in AWS Sagemaker invoke endpoint

How to improve response time of Phi-3-medium-128k serverless API?

How to get multimodal embeddings from CLIP model?

Is ChatGPT and LLM killing stackoverflow [migrated]

Saving Fine-tune Falcon HuggingFace LLM Model

GPT-2 model from hugging face always generate same result

OOM Error using PPO Trainer to LoRa-tune 4-bit Llama-3-8B Model (TRL Hugging Face Library)

Meta Llama-3 prompt sample

Chunking a Tokenized dataset

Export a teknium/OpenHermes-2.5-Mistral-7B model to ONNX

Google Canary, Docker and Gemini Nano

Want to retrain my LLM based user questions and answers on OpenAI

apply different learning rate for introduced tokens in the transformers library

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [large-language-model]

Related Tags