Questions tagged [large-language-model]
A general tag for large language model (LLM)-related subjects. Please ALWAYS use the more specific tags if available (GPT variants, PaLM , LLaMa, BLOOM, Claude etc..)
large-language-model
1,567
questions
-1
votes
0
answers
30
views
Embedding and Vector Search with Milvus
I am trying to make a RAG-based chatbot application that lets the user prompt in natural language and receive relevant information that can be retrieved a collection of from multiple tables, all of ...
-2
votes
0
answers
32
views
Want to run a Local LLM on Nvidia Jetson AGX Orin over GPU
I am looking to run a local LLM (Large Language Model) on an Nvidia Jetson AGX Orin over the GPU CUDA Cores . Could anyone provide guidance or share resources on how to achieve this?
Thank you in ...
0
votes
0
answers
27
views
How to increase maximum limit for completion_tokens in AWS Sagemaker invoke endpoint
I have deployed the meta-llama/Meta-Llama-3-8B-Instruct model using HuggingFaceModel. The model responds with the full output when I make a call using HuggingFaceModel's predictor method. Here is the ...
0
votes
0
answers
46
views
How to improve response time of Phi-3-medium-128k serverless API?
I have deployed the Phi-3-medium-128k model using Azure AI Studio (serverless deployment). I am using the v1/chat/completions API to get chat completions and I am streaming the response. The time to ...
0
votes
1
answer
39
views
How to get multimodal embeddings from CLIP model?
I'm hoping to use CLIP to get a single embedding for rows of multimodal (image and text) data.
Say I have the following model:
from PIL import Image
import torch
from transformers import CLIPProcessor,...
-1
votes
0
answers
5
views
Is ChatGPT and LLM killing stackoverflow [migrated]
Last few months I have been using ChatGPT LLM for coding, debugging, troubleshooting.
Earlier I used to google / post my question on stackoverflow.
But now I have instant solution to most of my coding ...
2
votes
1
answer
139
views
Saving Fine-tune Falcon HuggingFace LLM Model
I'm trying to save my model so it won't need to re-download the base model every time I want to use it but nothing seems to work for me, I would love your help with it.
The following parameters are ...
0
votes
1
answer
53
views
GPT-2 model from hugging face always generate same result
Why were all the results I got from the GPT-2 model the same no matter what I fed into it?
The following are my operating details.
First I download the needed files from the official website. These ...
0
votes
0
answers
20
views
OOM Error using PPO Trainer to LoRa-tune 4-bit Llama-3-8B Model (TRL Hugging Face Library)
As per the standard for PPO Training (which is to do supervised-fine tuning before running the PPO Algorithm) I did a QLoRa fine-tuning of the Llama-3-8B instruct model using my own custom data and ...
0
votes
0
answers
90
views
Meta Llama-3 prompt sample
I am trying to ask Llama-3 model to read a document and then answer my questions, but my code seems does not generate any output. Can someone tell me what’s wrong with the code? I appreciate it.
Code:
...
0
votes
0
answers
17
views
Chunking a Tokenized dataset
I am trying to experiment with the databricks-dolly-15k dataset to make it suitable for fine tuning a Llama2 model according to this article by Phil Schmid. The initial part of building the dataset is ...
0
votes
0
answers
24
views
Export a teknium/OpenHermes-2.5-Mistral-7B model to ONNX
I am trying to export teknium/OpenHermes-2.5-Mistral-7B to ONNX,
This is my code:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import onnx
model_name = "teknium/...
0
votes
0
answers
30
views
Google Canary, Docker and Gemini Nano
Can I run the latest Google Canary with the Gemini Nano model in a Docker container in headless mode and interact with the model via Selenium (execute_script)? If so, how do I do it?
-4
votes
0
answers
25
views
Want to retrain my LLM based user questions and answers on OpenAI
We have created a solution where users can upload their PDFs and ask questions. We have used NodeJS, Langchain, and OpenAI. Currently, the app flow is we save all the content of PDF in our vector ...
0
votes
0
answers
20
views
apply different learning rate for introduced tokens in the transformers library
Say I want to introduce a few new tokens into the vocabulary of an existing model, and I want these tokens to have a different learning rate compared to the rest of the model's parameters during ...