All Questions
Tagged with large-language-model huggingface-transformers
223
questions
0
votes
0
answers
9
views
CUDA Out of Memory Error Despite Having Multiple GPUs
I'm encountering a CUDA out-of-memory error while trying to run a PyTorch model, even though my system has multiple NVIDIA GPUs.
# Load the tokenizer and model
tokenizer = AutoTokenizer....
0
votes
0
answers
6
views
RuntimeError with DeBERTaV3 Sequence Classification: Tensor Size Mismatch
Iam trying to fine-tune the microsoft/deberta-v3-base model for sequence classification with three labels. I have set up my tokenizer and data preprocessing, but I encounter a RuntimeError during ...
0
votes
0
answers
20
views
Training LLM uses unexpected amount of GPU memory
I'm training model with self-implemented training loops. A 1.5B Qwen2 occupies 40G of GPU memory. When I did the same training using llama factory, it only takes about 24G.
I tried to delete some ...
0
votes
0
answers
23
views
Huggingface Trainer CUDA Out Of Memory for 500M Model
I'm training MobiLLama for classification. This model is just 500Million Parameters and when I fine-tune it for the downstream tasks, the trainer keep giving me the CUDA out of memory error.
I faced ...
0
votes
0
answers
14
views
How does the transformer model's attention mechanism deal with differing sequence lengths?
I am going through the architecture of the transformer and its attention mechanism. The thing I don't get about this mechanism is how it handles sequences of different lengths. For example:
How does ...
2
votes
1
answer
139
views
Saving Fine-tune Falcon HuggingFace LLM Model
I'm trying to save my model so it won't need to re-download the base model every time I want to use it but nothing seems to work for me, I would love your help with it.
The following parameters are ...
0
votes
1
answer
53
views
GPT-2 model from hugging face always generate same result
Why were all the results I got from the GPT-2 model the same no matter what I fed into it?
The following are my operating details.
First I download the needed files from the official website. These ...
0
votes
0
answers
20
views
OOM Error using PPO Trainer to LoRa-tune 4-bit Llama-3-8B Model (TRL Hugging Face Library)
As per the standard for PPO Training (which is to do supervised-fine tuning before running the PPO Algorithm) I did a QLoRa fine-tuning of the Llama-3-8B instruct model using my own custom data and ...
0
votes
0
answers
26
views
“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS”
I’ve built a vanilla Transformer using PyTorch for machine translation and am encountering issues while trying to train it on an Apple Mac M3 with a 12-core CPU and an 18-core GPU (18GB RAM) ...
0
votes
0
answers
53
views
Llama-3-70B with pipeline cannot generate new tokens (texts)
I have sucessfully downloaded Llama-3-70B, and when I want to test its "text-generation" ability, it always outputs my prompt and no other more texts.
Here is my demo code (copied from ...
0
votes
1
answer
97
views
Size mismatch for embed_out.weight: copying a param with shape torch.Size([0]) from checkpoint - Huggingface PyTorch
I want to finetune an LLM. I am able to successfully finetune LLM. But when reload the model after save, gets error. Below is the code
import argparse
import numpy as np
import torch
from datasets ...
0
votes
1
answer
112
views
Deepspeed : AttributeError: 'DummyOptim' object has no attribute 'step'
I want to use deepspeed for training LLMs along with Huggingface Trainer. But when I use deepspeed along with trainer I get error "AttributeError: 'DummyOptim' object has no attribute 'step'&...
1
vote
0
answers
43
views
HuggingFace pipeline doesn't use multiple GPUs
I made a RAG app that basically answers user questions based on provided data, it works fine on GPU and a single GPU. I want to deploy it on multiple GPUs (4 T4s) but I always get CUDA out of Memory ...
1
vote
0
answers
113
views
How to fine-tune merlinite 7B model in Python
I am new to LLM programming in Python and I am trying to fine-tune the instructlab/merlinite-7b-lab model on my Mac M1. My goal is to teach this model to a new music composer Xenobi Amilen I have ...
0
votes
0
answers
15
views
Huggingface Trainer logs different sample size than actual
I am trying to Finetune model.
Here is the train-test split of my dataset - Train - 4746 (80%)
Test - 1188 (20%)
Here is my code snippet:
training_args = TrainingArguments(
bf16=True, # specify ...