Newest 'large-language-model' Questions

0 votes

0 answers

5 views

How to view the final prompt in a MultiQueryRetriever pipeline using LangChain?

I am currently working on a project using the LangChain library where I want to retrieve relevant documents from a vector database and then generate answers based on these documents using the Ollama ...

Rasik

2,261

asked 6 mins ago

0 votes

0 answers

10 views

CUDA Out of Memory Error Despite Having Multiple GPUs

I'm encountering a CUDA out-of-memory error while trying to run a PyTorch model, even though my system has multiple NVIDIA GPUs. # Load the tokenizer and model tokenizer = AutoTokenizer....

Flying-Meta

1

asked 6 hours ago

0 votes

0 answers

8 views

GGUF model in LM Studio returns broken answer

I try to run LLM GGUF model QuantFactory/T-lite-instruct-0.1-GGUF specifically its quantized version T-lite-instruct-0.1.Q2_K.gguf in LM Studio. Sometimes it works fine. But sometimes it returns "...

pav

99

asked yesterday

0 votes

0 answers

6 views

RuntimeError with DeBERTaV3 Sequence Classification: Tensor Size Mismatch

Iam trying to fine-tune the microsoft/deberta-v3-base model for sequence classification with three labels. I have set up my tokenizer and data preprocessing, but I encounter a RuntimeError during ...

suri

21

asked yesterday

0 votes

0 answers

16 views

Load Phi 3 small on Nvidia Tesla V100 - Flash Attention

I would like to inquire about the possibility of uploading and fine tuning a Phi 3 8k small. When I load the model, I get an error about missing Flash attention. If I want to install the given package,...

Roman Frič

1

asked yesterday

0 votes

0 answers

27 views

Unable to solve dtype issue using UnslothAI fine tuning for Llama 3.1 8B model

I am new to fine tuning LLMs and I have been trying to run the notebooks provided by UnSlothAI. For this question, I am running the code for fine-tuning LLaMa 3.1 8B model as posted here This colab ...

adhok

411

asked yesterday

0 votes

0 answers

25 views

Need to Implement Function calling for Mistral 7b-instruct v.02 model in Sagemaker

I trying to add function calling in my chatbot code to actually fetch the tools if the user query is related to the tool. I was trying based on the internet format but i don't know where the error is. ...

vinoth kumar

111

asked yesterday

0 votes

0 answers

24 views

Fine-tune LLM on custom schema to be used in sqlcoder, an ollama based llm

I am working on a POC to convert Natural language to SQL. I have used phi3 and now planning to use sqlcoder as part of the llm. All this are set up via ollama which I am running on docker. The one ...

Srikant Sahu

839

asked 2 days ago

0 votes

0 answers

16 views

After uploading LLM to Google Colab, how to use it in a code?

Recently, for a project, I have uploaded Meta Llama 3 8B model from huggingface to Google Colab, since the model's high VRAM requirements were not being met by my pc. Therefore i needed Colab's ...

Anuvab Das

1

asked 2 days ago

0 votes

1 answer

36 views

AzureChatOpenAI only uses one tool at a time

LangChain with AzureChatOpenAI is only ever calling one tool at a time. When prompting the model to multiply and add two sets of numbers, I expect two tool calls, however only one tool is called, ...

Julian

1

asked 2 days ago

0 votes

0 answers

20 views

Optimal hyperparameters for fine tuning LLM

could I ask you for help? I am doing fine tuning of LLM model Llama3 8b (with LoRA) for text classification. I am using Trainer from Huggingface. I am looking for the optimal ...

Roman Frič

1

asked 2 days ago

1 vote

0 answers

9 views

Where should I store my model if I want it loaded while running my application on Magic Leap?

I need my GGML model loaded when I open my app. But I used params.model = "files\\gpt-2-117M"; std::ifstream file(params.model); if (!file.good()) { return "wrong"; } ...

Xinyu Liu

11

asked 2 days ago

-1 votes

0 answers

14 views

Langchain agent not using tool every time in react for pdfchat, and concludes with wrong answer

Im using the react template proposed by langchain hwchase17/react-chat react template with chat history. when using this template sometimes it fails to look into rag tool so it is getting results from ...

afsal ali

1

asked Jul 24 at 5:03

0 votes

1 answer

20 views

Is it possible to train large models on multiple TPUs/GPUs on Google Colab?

I am working on training a (small-scale) large language model and would like to parallelize the training on Google Colab. Specifically, I want to know if it's possible to utilize multiple TPUs or GPUs ...

aiaiai

31

asked Jul 24 at 4:07

0 votes

0 answers

38 views

Script for streaming Mistral-7B LLM output only streams on server side. Client gets full output

I designed a remote server - client pipeline, which is supposed to load the model on the server and stream the output of the model. At the moment, the output is correctly streamed, but only inside the ...

Phys

518

asked Jul 23 at 16:16

Collectives™ on Stack Overflow

Questions tagged [large-language-model]

How to view the final prompt in a MultiQueryRetriever pipeline using LangChain?

CUDA Out of Memory Error Despite Having Multiple GPUs

GGUF model in LM Studio returns broken answer

RuntimeError with DeBERTaV3 Sequence Classification: Tensor Size Mismatch

Load Phi 3 small on Nvidia Tesla V100 - Flash Attention

Unable to solve dtype issue using UnslothAI fine tuning for Llama 3.1 8B model

Need to Implement Function calling for Mistral 7b-instruct v.02 model in Sagemaker

Fine-tune LLM on custom schema to be used in sqlcoder, an ollama based llm

After uploading LLM to Google Colab, how to use it in a code?

AzureChatOpenAI only uses one tool at a time

Optimal hyperparameters for fine tuning LLM

Where should I store my model if I want it loaded while running my application on Magic Leap?

Langchain agent not using tool every time in react for pdfchat, and concludes with wrong answer

Is it possible to train large models on multiple TPUs/GPUs on Google Colab?

Script for streaming Mistral-7B LLM output only streams on server side. Client gets full output

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [large-language-model]

Related Tags