Questions tagged [large-language-model]
A general tag for large language model (LLM)-related subjects. Please ALWAYS use the more specific tags if available (GPT variants, PaLM , LLaMa, BLOOM, Claude etc..)
large-language-model
1,565
questions
0
votes
0
answers
10
views
CUDA Out of Memory Error Despite Having Multiple GPUs
I'm encountering a CUDA out-of-memory error while trying to run a PyTorch model, even though my system has multiple NVIDIA GPUs.
# Load the tokenizer and model
tokenizer = AutoTokenizer....
0
votes
0
answers
8
views
GGUF model in LM Studio returns broken answer
I try to run LLM GGUF model QuantFactory/T-lite-instruct-0.1-GGUF specifically its quantized version T-lite-instruct-0.1.Q2_K.gguf in LM Studio.
Sometimes it works fine. But sometimes it returns "...
0
votes
0
answers
6
views
RuntimeError with DeBERTaV3 Sequence Classification: Tensor Size Mismatch
Iam trying to fine-tune the microsoft/deberta-v3-base model for sequence classification with three labels. I have set up my tokenizer and data preprocessing, but I encounter a RuntimeError during ...
0
votes
0
answers
16
views
Load Phi 3 small on Nvidia Tesla V100 - Flash Attention
I would like to inquire about the possibility of uploading and fine tuning a Phi 3 8k small. When I load the model, I get an error about missing Flash attention. If I want to install the given package,...
0
votes
0
answers
27
views
Unable to solve dtype issue using UnslothAI fine tuning for Llama 3.1 8B model
I am new to fine tuning LLMs and I have been trying to run the notebooks provided by UnSlothAI. For this question, I am running the code for fine-tuning LLaMa 3.1 8B model as posted here
This colab ...
0
votes
0
answers
25
views
Need to Implement Function calling for Mistral 7b-instruct v.02 model in Sagemaker
I trying to add function calling in my chatbot code to actually fetch the tools if the user query is related to the tool. I was trying based on the internet format but i don't know where the error is. ...
0
votes
0
answers
22
views
Fine-tune LLM on custom schema to be used in sqlcoder, an ollama based llm
I am working on a POC to convert Natural language to SQL. I have used phi3 and now planning to use sqlcoder as part of the llm. All this are set up via ollama which I am running on docker.
The one ...
0
votes
0
answers
16
views
After uploading LLM to Google Colab, how to use it in a code?
Recently, for a project, I have uploaded Meta Llama 3 8B model from huggingface to Google Colab, since the model's high VRAM requirements were not being met by my pc. Therefore i needed Colab's ...
0
votes
1
answer
36
views
AzureChatOpenAI only uses one tool at a time
LangChain with AzureChatOpenAI is only ever calling one tool at a time.
When prompting the model to multiply and add two sets of numbers, I expect two tool calls, however only one tool is called, ...
0
votes
0
answers
20
views
Optimal hyperparameters for fine tuning LLM
could I ask you for help? I am doing fine tuning of LLM model Llama3 8b (with LoRA) for text classification. I am using Trainer from Huggingface. I am looking for the optimal ...
1
vote
0
answers
9
views
Where should I store my model if I want it loaded while running my application on Magic Leap?
I need my GGML model loaded when I open my app. But I used
params.model = "files\\gpt-2-117M";
std::ifstream file(params.model);
if (!file.good()) {
return "wrong";
}
...
-1
votes
0
answers
14
views
Langchain agent not using tool every time in react for pdfchat, and concludes with wrong answer
Im using the react template proposed by langchain hwchase17/react-chat react template with chat history. when using this template sometimes it fails to look into rag tool so it is getting results from ...
0
votes
1
answer
20
views
Is it possible to train large models on multiple TPUs/GPUs on Google Colab?
I am working on training a (small-scale) large language model and would like to parallelize the training on Google Colab. Specifically, I want to know if it's possible to utilize multiple TPUs or GPUs ...
0
votes
0
answers
38
views
Script for streaming Mistral-7B LLM output only streams on server side. Client gets full output
I designed a remote server - client pipeline, which is supposed to load the model on the server and stream the output of the model.
At the moment, the output is correctly streamed, but only inside the ...
0
votes
0
answers
23
views
How to include ggml library in native C++ Android Studio Project for Magic Leap
I am trying to run GPT-2 on magic leap with the ggml library. By now, I have succeeded in running ggml examples on my Windows computer. However, I don't know how to link all the libraries and the ...
0
votes
0
answers
22
views
Error : PydanticUserError: If you use `@root_validator` with pre=False (the default) you MUST specify `skip_on_failure=True`
Here is the code after loading data and creating index I am trying to setup the chat engine:
index = load_and_index_data()
chat_engine = index.as_chat_engine(chat_mode="condense_question", ...
-1
votes
0
answers
15
views
Use hugging face API correctly
I'm working on a simple LLM project, here is my code:
import chromadb
import os
import chromadb.utils.embedding_functions as embedding_functions
import gradio as gr
import requests
import json
from ...
-4
votes
0
answers
31
views
Developing an ai assistant [closed]
I wanna develop an ai assistant that is highly scalable meaning everytime i need to build a model I just retrain my ai assistant on a new data. It can genertate text and content , it has a feedback ...
-2
votes
0
answers
20
views
How does AI understands the tree diagrams in text format?
How does AI/LLM understand and interpret tree diagrams represented in text format, such as the following example? (Ex: prompt given to Chatgpt)
Is the AI trained on specific datasets that explicitly ...
-4
votes
0
answers
18
views
I want the model to generate an exact number of tokens, no more, no less [closed]
Are there any tips or best practices to achieve this? I have tried few-shot prompting
are there any open source models which can perform this?
I have tried few-shot prompting it was not giving best ...
-1
votes
0
answers
18
views
How to Modify and Replace Embeddings in a Large Language Model (LLM)? [closed]
I am a beginner in large language models (LLMs) and I am working on a project. I have a question regarding embeddings in an LLM. How can I modify the embeddings of an LLM? Are they stored in a ...
1
vote
0
answers
28
views
TRL SFTTrainer clarification on truncation
I am currently finetuning LLama models using SFTTrainer in huggingface. However, I came up with a question, I can not answer through the documentations (atleast, it is a bit ambigious).
My dataset ...
-2
votes
0
answers
30
views
What is the best language model for fine tuning with dataset in Persian language? [closed]
I try to fine tune llama2 language model with dataset that I created in Persian language. But when I tokenize this dataset I noticed that llama2 tokenizer tokenized dataset in character level not word ...
-1
votes
0
answers
26
views
How to Estimate GPU Memory for training and inference, Data Requirements, and Training Time for Large Language Models?
This is a very concrete and well-defined computer engineering question. I don't understand why someone would want to close it.
Today, I faced this question during an interview for an ML Engineer ...
0
votes
0
answers
100
views
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
am trying to output the response of the llama2 model that i installed locally, but when i try to execute the following lines:
output = model.generate(**inputs, streamer=streamer,
use_cache=True, ...
0
votes
0
answers
20
views
Training LLM uses unexpected amount of GPU memory
I'm training model with self-implemented training loops. A 1.5B Qwen2 occupies 40G of GPU memory. When I did the same training using llama factory, it only takes about 24G.
I tried to delete some ...
0
votes
0
answers
33
views
How to evaluate LLM response [closed]
I am retrieving response using QWEN 72B model. I want to validate my response and don’t have ground truth answers. How can I evaluate my response without help of ground truth answers. I want to use ...
-1
votes
0
answers
20
views
How to resolve ``` backticks error that occur while generating sql query in gemini llm to build a NL2SQL chatbot building
I am using llm to fetch data from my postgres db table
This is the output that is being generated , Even though i have mentioned in the prompt to not add backticks while generating sql queries
This is ...
0
votes
0
answers
29
views
Unable to import SentenceTransformer
I am using Colab, I am trying to import SentenceTransformer:
from sentence_transformers import SentenceTransformer
However, I got this error:
ttributeError Traceback (most ...
-2
votes
0
answers
20
views
training help hybrid based model that integrates contextual and numerical features for a classification problem [closed]
I want a critical production RISK analysis problem. So, based on a record I want to risk rank each record from 0 to 5. The training set is fairly imbalanced.
> "0.0 964
> 1.0 393
&...