Newest 'large-language-model' Questions

0 votes

0 answers

9 views

CUDA Out of Memory Error Despite Having Multiple GPUs

I'm encountering a CUDA out-of-memory error while trying to run a PyTorch model, even though my system has multiple NVIDIA GPUs. # Load the tokenizer and model tokenizer = AutoTokenizer....

Flying-Meta

1

asked 4 hours ago

0 votes

0 answers

8 views

GGUF model in LM Studio returns broken answer

I try to run LLM GGUF model QuantFactory/T-lite-instruct-0.1-GGUF specifically its quantized version T-lite-instruct-0.1.Q2_K.gguf in LM Studio. Sometimes it works fine. But sometimes it returns "...

pav

99

asked yesterday

0 votes

0 answers

6 views

RuntimeError with DeBERTaV3 Sequence Classification: Tensor Size Mismatch

Iam trying to fine-tune the microsoft/deberta-v3-base model for sequence classification with three labels. I have set up my tokenizer and data preprocessing, but I encounter a RuntimeError during ...

suri

21

asked yesterday

0 votes

0 answers

16 views

Load Phi 3 small on Nvidia Tesla V100 - Flash Attention

I would like to inquire about the possibility of uploading and fine tuning a Phi 3 8k small. When I load the model, I get an error about missing Flash attention. If I want to install the given package,...

Roman Frič

1

asked yesterday

0 votes

0 answers

27 views

Unable to solve dtype issue using UnslothAI fine tuning for Llama 3.1 8B model

I am new to fine tuning LLMs and I have been trying to run the notebooks provided by UnSlothAI. For this question, I am running the code for fine-tuning LLaMa 3.1 8B model as posted here This colab ...

adhok

411

asked yesterday

0 votes

0 answers

25 views

Need to Implement Function calling for Mistral 7b-instruct v.02 model in Sagemaker

I trying to add function calling in my chatbot code to actually fetch the tools if the user query is related to the tool. I was trying based on the internet format but i don't know where the error is. ...

vinoth kumar

111

asked yesterday

0 votes

0 answers

22 views

Fine-tune LLM on custom schema to be used in sqlcoder, an ollama based llm

I am working on a POC to convert Natural language to SQL. I have used phi3 and now planning to use sqlcoder as part of the llm. All this are set up via ollama which I am running on docker. The one ...

Srikant Sahu

839

asked 2 days ago

0 votes

0 answers

16 views

After uploading LLM to Google Colab, how to use it in a code?

Recently, for a project, I have uploaded Meta Llama 3 8B model from huggingface to Google Colab, since the model's high VRAM requirements were not being met by my pc. Therefore i needed Colab's ...

Anuvab Das

1

asked 2 days ago

0 votes

1 answer

35 views

AzureChatOpenAI only uses one tool at a time

LangChain with AzureChatOpenAI is only ever calling one tool at a time. When prompting the model to multiply and add two sets of numbers, I expect two tool calls, however only one tool is called, ...

Julian

1

asked 2 days ago

0 votes

0 answers

20 views

Optimal hyperparameters for fine tuning LLM

could I ask you for help? I am doing fine tuning of LLM model Llama3 8b (with LoRA) for text classification. I am using Trainer from Huggingface. I am looking for the optimal ...

Roman Frič

1

asked 2 days ago

1 vote

0 answers

9 views

Where should I store my model if I want it loaded while running my application on Magic Leap?

I need my GGML model loaded when I open my app. But I used params.model = "files\\gpt-2-117M"; std::ifstream file(params.model); if (!file.good()) { return "wrong"; } ...

Xinyu Liu

11

asked 2 days ago

-1 votes

0 answers

14 views

Langchain agent not using tool every time in react for pdfchat, and concludes with wrong answer

Im using the react template proposed by langchain hwchase17/react-chat react template with chat history. when using this template sometimes it fails to look into rag tool so it is getting results from ...

afsal ali

1

asked Jul 24 at 5:03

0 votes

1 answer

20 views

Is it possible to train large models on multiple TPUs/GPUs on Google Colab?

I am working on training a (small-scale) large language model and would like to parallelize the training on Google Colab. Specifically, I want to know if it's possible to utilize multiple TPUs or GPUs ...

aiaiai

31

asked Jul 24 at 4:07

0 votes

0 answers

38 views

Script for streaming Mistral-7B LLM output only streams on server side. Client gets full output

I designed a remote server - client pipeline, which is supposed to load the model on the server and stream the output of the model. At the moment, the output is correctly streamed, but only inside the ...

Phys

518

asked Jul 23 at 16:16

0 votes

0 answers

23 views

How to include ggml library in native C++ Android Studio Project for Magic Leap

I am trying to run GPT-2 on magic leap with the ggml library. By now, I have succeeded in running ggml examples on my Windows computer. However, I don't know how to link all the libraries and the ...

Xinyu Liu

11

asked Jul 23 at 8:34

0 votes

0 answers

22 views

Error : PydanticUserError: If you use `@root_validator` with pre=False (the default) you MUST specify `skip_on_failure=True`

Here is the code after loading data and creating index I am trying to setup the chat engine: index = load_and_index_data() chat_engine = index.as_chat_engine(chat_mode="condense_question", ...

sri vidya

15

asked Jul 23 at 7:11

-1 votes

0 answers

15 views

Use hugging face API correctly

I'm working on a simple LLM project, here is my code: import chromadb import os import chromadb.utils.embedding_functions as embedding_functions import gradio as gr import requests import json from ...

3cr1sp3l

1

asked Jul 22 at 22:30

-4 votes

0 answers

31 views

Developing an ai assistant [closed]

I wanna develop an ai assistant that is highly scalable meaning everytime i need to build a model I just retrain my ai assistant on a new data. It can genertate text and content , it has a feedback ...

Khoubaib Bourbia

1

asked Jul 22 at 11:53

-2 votes

0 answers

20 views

How does AI understands the tree diagrams in text format?

How does AI/LLM understand and interpret tree diagrams represented in text format, such as the following example? (Ex: prompt given to Chatgpt) Is the AI trained on specific datasets that explicitly ...

Jayanth

193

asked Jul 22 at 9:47

-4 votes

0 answers

18 views

I want the model to generate an exact number of tokens, no more, no less [closed]

Are there any tips or best practices to achieve this? I have tried few-shot prompting are there any open source models which can perform this? I have tried few-shot prompting it was not giving best ...

Rohit Behera

1

asked Jul 22 at 4:29

-1 votes

0 answers

18 views

How to Modify and Replace Embeddings in a Large Language Model (LLM)? [closed]

I am a beginner in large language models (LLMs) and I am working on a project. I have a question regarding embeddings in an LLM. How can I modify the embeddings of an LLM? Are they stored in a ...

Steven Thorn

1

asked Jul 22 at 3:09

1 vote

0 answers

28 views

TRL SFTTrainer clarification on truncation

I am currently finetuning LLama models using SFTTrainer in huggingface. However, I came up with a question, I can not answer through the documentations (atleast, it is a bit ambigious). My dataset ...

iiiiiiiiiiiiiiiiiiii

335

asked Jul 20 at 20:46

-2 votes

0 answers

30 views

What is the best language model for fine tuning with dataset in Persian language? [closed]

I try to fine tune llama2 language model with dataset that I created in Persian language. But when I tokenize this dataset I noticed that llama2 tokenizer tokenized dataset in character level not word ...

user23446017

1

asked Jul 20 at 11:46

-1 votes

0 answers

26 views

How to Estimate GPU Memory for training and inference, Data Requirements, and Training Time for Large Language Models?

This is a very concrete and well-defined computer engineering question. I don't understand why someone would want to close it. Today, I faced this question during an interview for an ML Engineer ...

maplemaple

1,435

asked Jul 20 at 7:32

0 votes

0 answers

100 views

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

am trying to output the response of the llama2 model that i installed locally, but when i try to execute the following lines: output = model.generate(**inputs, streamer=streamer, use_cache=True, ...

noureddine

3

asked Jul 19 at 10:55

0 votes

0 answers

20 views

Training LLM uses unexpected amount of GPU memory

I'm training model with self-implemented training loops. A 1.5B Qwen2 occupies 40G of GPU memory. When I did the same training using llama factory, it only takes about 24G. I tried to delete some ...

StaEx_G

13

asked Jul 19 at 10:02

0 votes

0 answers

33 views

How to evaluate LLM response [closed]

I am retrieving response using QWEN 72B model. I want to validate my response and don’t have ground truth answers. How can I evaluate my response without help of ground truth answers. I want to use ...

Prashanth Kolaneru

15

asked Jul 19 at 9:32

-1 votes

0 answers

20 views

How to resolve ``` backticks error that occur while generating sql query in gemini llm to build a NL2SQL chatbot building

I am using llm to fetch data from my postgres db table This is the output that is being generated , Even though i have mentioned in the prompt to not add backticks while generating sql queries This is ...

Lad99

1

asked Jul 19 at 6:27

0 votes

0 answers

29 views

Unable to import SentenceTransformer

I am using Colab, I am trying to import SentenceTransformer: from sentence_transformers import SentenceTransformer However, I got this error: ttributeError Traceback (most ...

A1iMansour

11

asked Jul 18 at 22:24

-2 votes

0 answers

20 views

training help hybrid based model that integrates contextual and numerical features for a classification problem [closed]

I want a critical production RISK analysis problem. So, based on a record I want to risk rank each record from 0 to 5. The training set is fairly imbalanced. > "0.0 964 > 1.0 393 &...

wayne halks

5

asked Jul 18 at 21:51

0 votes

0 answers

23 views

Huggingface Trainer CUDA Out Of Memory for 500M Model

I'm training MobiLLama for classification. This model is just 500Million Parameters and when I fine-tune it for the downstream tasks, the trainer keep giving me the CUDA out of memory error. I faced ...

Hoangdz

187

asked Jul 18 at 16:28

0 votes

0 answers

20 views

Defining Agent in LLamaIndex and Mistral 7B is throwing Attribute error

I am using llamaIndex and locally downloaded Mistral model (mistral-7b-instruct-v0.2.Q4_K_M.gguf). I have created the python binding for this model using "llama-cpp". On defining the agent ...

ritwikv

1

asked Jul 18 at 15:25

0 votes

0 answers

47 views

'LlamaForCausalLM' object has no attribute 'max_seq_length'

I'm fine-tuning llama3 using unsloth , I trained my model and saved it successfully but when I tried loading using AutoPeftModelForCausalLM.from_pretrained ,then I used TextStreamer from transformer ...

Sarra Ben Messaoud

1

asked Jul 18 at 10:47

-1 votes

0 answers

13 views

Measuring relevance of the knowledge base to user questions

I have a document that explains finance policies and processes of some company. The goal is to build a chatbot using RAG framework upon that document to serve employees who have queries related to ...

Mohamed Abd ElBaset

1

asked Jul 18 at 10:07

0 votes

1 answer

87 views

Error when tracing llm calls with Langsmith (Failed to get info from https://eu.smith.langchain.com) (Failed to batch ingest runs: LangSmithError))

I have an issue with lansgmith setup. I tried to search on the web for this issue, but could not find a solution. I follow these steps: Created a new fresh environment: conda create --name ...

Andrea Neri

105

asked Jul 18 at 9:13

0 votes

1 answer

127 views

Convert safetensors model format(LLaVA model) into gguf format

I want to do LLaVA inference in ollama, so I need to convert it in gguf file format. My model has the file format safetensors.(trained with lora) It seems that ollama supports only llama, but not ...

Jiyong Jeong

1

asked Jul 18 at 8:47

-3 votes

0 answers

33 views

Integrating web scraping and LLMs [closed]

I wanted to extract some information about a specific drug (lets say Rolvedon) from this site. I tried using BeautifulSoup and Scrapy but they seem to be very format dependent. I want the code to be ...

Mandvi Shukla

1

asked Jul 18 at 8:28

-1 votes

0 answers

29 views

Implementing Few-Shot Learning without Prompts for Llama2

I am working with the Llama2 model. I have successfully started and fine-tuned the model, and I have also used Few-Shot Prompting with and without LangChain. However, now I am looking for a method ...

user26411748

1

asked Jul 17 at 18:56

0 votes

0 answers

14 views

How does the transformer model's attention mechanism deal with differing sequence lengths?

I am going through the architecture of the transformer and its attention mechanism. The thing I don't get about this mechanism is how it handles sequences of different lengths. For example: How does ...

Syed Mustaqhim

491

asked Jul 17 at 17:29

0 votes

1 answer

11 views

Preparing text data for raft implementation

I want to use Raft Retrieval Augmented Fine Tuning to build a smart chatbot. My data consists of scraped text from multiple websites. Should I transform it all to QAD format? If so, is there a way to ...

yasmina hachhouch

1

asked Jul 17 at 15:56

0 votes

0 answers

50 views

How should I use Llama-3 properly?

I downloaded the Meta-Llama-3-70B-Instruct model using the download.sh and the url provided by Meta email, and this is all the files in the folder. enter image description here And when I tried to use ...

Joey1205

1

asked Jul 17 at 13:29

0 votes

0 answers

36 views

messages with role 'tool' must have a 'tool_call_id'

I wrote a Multi-Agent program based on AutoGen to let 2 agents play chess. But an error is randomly triggered as I execute my program: openai.BadRequestError: Error code: 400 - {'error': {'message': &...

Captain_Lee

1

asked Jul 17 at 9:09

0 votes

0 answers

12 views

ImageBind LLM checkpoint

i want to use imagebind llm model for my task, but i can not import llama and find out the checkpoints for ImageBind-LLM.enter image description here i installed all required packages and followed ...

Duy Nhất

1

asked Jul 17 at 8:18

2 votes

0 answers

33 views

DSPy can't retrieve passage with text embeddings in ChromaDB

I am working on a RAG application using DSPy and ChromaDB for pdf files. At first I fetched the text from the pdf and add it to the Chromadb as chunks. Also added the embeddings of the chunks. And ...

Anandu Aji

41

asked Jul 17 at 8:03

0 votes

0 answers

15 views

AnswerRelevancyMetric not showing results on LLM evaluation

i've got this code : from langchain_openai import AzureChatOpenAI from deepeval.models.base_model import DeepEvalBaseLLM class AzureOpenAI(DeepEvalBaseLLM): def __init__( self, ...

Sara

422

asked Jul 17 at 7:59

-1 votes

0 answers

30 views

Embedding and Vector Search with Milvus

I am trying to make a RAG-based chatbot application that lets the user prompt in natural language and receive relevant information that can be retrieved a collection of from multiple tables, all of ...

Calvin Nguyen

1

asked Jul 17 at 4:44

-2 votes

0 answers

32 views

Want to run a Local LLM on Nvidia Jetson AGX Orin over GPU

I am looking to run a local LLM (Large Language Model) on an Nvidia Jetson AGX Orin over the GPU CUDA Cores . Could anyone provide guidance or share resources on how to achieve this? Thank you in ...

Mausam Jain

1

asked Jul 17 at 3:55

0 votes

0 answers

27 views

How to increase maximum limit for completion_tokens in AWS Sagemaker invoke endpoint

I have deployed the meta-llama/Meta-Llama-3-8B-Instruct model using HuggingFaceModel. The model responds with the full output when I make a call using HuggingFaceModel's predictor method. Here is the ...

keerti4p

1

asked Jul 16 at 9:29

0 votes

0 answers

46 views

How to improve response time of Phi-3-medium-128k serverless API?

I have deployed the Phi-3-medium-128k model using Azure AI Studio (serverless deployment). I am using the v1/chat/completions API to get chat completions and I am streaming the response. The time to ...

Rithika Chowta

1

asked Jul 16 at 7:53

0 votes

1 answer

39 views

How to get multimodal embeddings from CLIP model?

I'm hoping to use CLIP to get a single embedding for rows of multimodal (image and text) data. Say I have the following model: from PIL import Image import torch from transformers import CLIPProcessor,...

T_d

13

asked Jul 15 at 19:53

Collectives™ on Stack Overflow

Questions tagged [large-language-model]

Related Tags