Newest 'large-language-model+huggingface-transformers' Questions

0 votes

0 answers

9 views

CUDA Out of Memory Error Despite Having Multiple GPUs

I'm encountering a CUDA out-of-memory error while trying to run a PyTorch model, even though my system has multiple NVIDIA GPUs. # Load the tokenizer and model tokenizer = AutoTokenizer....

Flying-Meta

1

asked 4 hours ago

0 votes

0 answers

6 views

RuntimeError with DeBERTaV3 Sequence Classification: Tensor Size Mismatch

Iam trying to fine-tune the microsoft/deberta-v3-base model for sequence classification with three labels. I have set up my tokenizer and data preprocessing, but I encounter a RuntimeError during ...

suri

21

asked yesterday

0 votes

0 answers

20 views

Training LLM uses unexpected amount of GPU memory

I'm training model with self-implemented training loops. A 1.5B Qwen2 occupies 40G of GPU memory. When I did the same training using llama factory, it only takes about 24G. I tried to delete some ...

StaEx_G

13

asked Jul 19 at 10:02

0 votes

0 answers

23 views

Huggingface Trainer CUDA Out Of Memory for 500M Model

I'm training MobiLLama for classification. This model is just 500Million Parameters and when I fine-tune it for the downstream tasks, the trainer keep giving me the CUDA out of memory error. I faced ...

Hoangdz

187

asked Jul 18 at 16:28

0 votes

0 answers

14 views

How does the transformer model's attention mechanism deal with differing sequence lengths?

I am going through the architecture of the transformer and its attention mechanism. The thing I don't get about this mechanism is how it handles sequences of different lengths. For example: How does ...

Syed Mustaqhim

491

asked Jul 17 at 17:29

2 votes

1 answer

139 views

Saving Fine-tune Falcon HuggingFace LLM Model

I'm trying to save my model so it won't need to re-download the base model every time I want to use it but nothing seems to work for me, I would love your help with it. The following parameters are ...

Lidor Eliyahu Shelef

1,334

asked Jul 15 at 14:20

0 votes

1 answer

53 views

GPT-2 model from hugging face always generate same result

Why were all the results I got from the GPT-2 model the same no matter what I fed into it? The following are my operating details. First I download the needed files from the official website. These ...

zhangtianpu

1

asked Jul 15 at 6:31

0 votes

0 answers

20 views

OOM Error using PPO Trainer to LoRa-tune 4-bit Llama-3-8B Model (TRL Hugging Face Library)

As per the standard for PPO Training (which is to do supervised-fine tuning before running the PPO Algorithm) I did a QLoRa fine-tuning of the Llama-3-8B instruct model using my own custom data and ...

Aryaman Jaggi

1

asked Jul 15 at 2:45

0 votes

0 answers

26 views

“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS”

I’ve built a vanilla Transformer using PyTorch for machine translation and am encountering issues while trying to train it on an Apple Mac M3 with a 12-core CPU and an 18-core GPU (18GB RAM) ...

Pratheesh Kumar

1

asked Jul 13 at 6:23

0 votes

0 answers

53 views

Llama-3-70B with pipeline cannot generate new tokens (texts)

I have sucessfully downloaded Llama-3-70B, and when I want to test its "text-generation" ability, it always outputs my prompt and no other more texts. Here is my demo code (copied from ...

Martin

11

asked Jul 11 at 16:10

0 votes

1 answer

97 views

Size mismatch for embed_out.weight: copying a param with shape torch.Size([0]) from checkpoint - Huggingface PyTorch

I want to finetune an LLM. I am able to successfully finetune LLM. But when reload the model after save, gets error. Below is the code import argparse import numpy as np import torch from datasets ...

Masthan

685

asked Jul 5 at 18:29

0 votes

1 answer

112 views

Deepspeed : AttributeError: 'DummyOptim' object has no attribute 'step'

I want to use deepspeed for training LLMs along with Huggingface Trainer. But when I use deepspeed along with trainer I get error "AttributeError: 'DummyOptim' object has no attribute 'step'&...

Masthan

685

asked Jul 2 at 14:53

1 vote

0 answers

43 views

HuggingFace pipeline doesn't use multiple GPUs

I made a RAG app that basically answers user questions based on provided data, it works fine on GPU and a single GPU. I want to deploy it on multiple GPUs (4 T4s) but I always get CUDA out of Memory ...

Cihan Yalçın

43

asked Jul 1 at 18:14

1 vote

0 answers

113 views

How to fine-tune merlinite 7B model in Python

I am new to LLM programming in Python and I am trying to fine-tune the instructlab/merlinite-7b-lab model on my Mac M1. My goal is to teach this model to a new music composer Xenobi Amilen I have ...

Salvatore D'angelo

1,109

asked Jun 30 at 21:37

0 votes

0 answers

15 views

Huggingface Trainer logs different sample size than actual

I am trying to Finetune model. Here is the train-test split of my dataset - Train - 4746 (80%) Test - 1188 (20%) Here is my code snippet: training_args = TrainingArguments( bf16=True, # specify ...

quick_silver009

618

asked Jun 30 at 15:56

Collectives™ on Stack Overflow

All Questions

CUDA Out of Memory Error Despite Having Multiple GPUs

RuntimeError with DeBERTaV3 Sequence Classification: Tensor Size Mismatch

Training LLM uses unexpected amount of GPU memory

Huggingface Trainer CUDA Out Of Memory for 500M Model

How does the transformer model's attention mechanism deal with differing sequence lengths?

Saving Fine-tune Falcon HuggingFace LLM Model

GPT-2 model from hugging face always generate same result

OOM Error using PPO Trainer to LoRa-tune 4-bit Llama-3-8B Model (TRL Hugging Face Library)

“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS”

Llama-3-70B with pipeline cannot generate new tokens (texts)

Size mismatch for embed_out.weight: copying a param with shape torch.Size([0]) from checkpoint - Huggingface PyTorch

Deepspeed : AttributeError: 'DummyOptim' object has no attribute 'step'

HuggingFace pipeline doesn't use multiple GPUs

How to fine-tune merlinite 7B model in Python

Huggingface Trainer logs different sample size than actual

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags