Newest 'transformer-model' Questions

0 votes

0 answers

6 views

How to visualize attention for long sequences (e.g., amino acids of length 1000) in Transformer models?

I am working with Transformer models and I have a specific use case where I need to visualize the attention mechanism for long sequences. Specifically, I am dealing with amino acid sequences of length ...

Farshid B

1

asked yesterday

0 votes

0 answers

14 views

Keras Transformer regression model not predicting values beyond a threshold

I am currently working on a keras transformer model for regression and I am getting prediction values which are cut off to some specific threshold. Code: def transformer_block(self,inputs, embed_dim, ...

srihari madhavan

1

asked yesterday

0 votes

0 answers

11 views

How to Track Attention Weights of a Transformer Model with Comet?

I am working on a Transformer model for a translation task and want to track attention weights using Comet. My model consists of 2 layers with 2 attention heads each. I am interested in understanding ...

Farshid B

1

asked yesterday

0 votes

0 answers

18 views

Transformer models for contextual word embedding in large datasets

I'm interested in using contextual word embeddings generated by a transformer-based model to explore the similarity of certain words in a large dataset. Most transformer models only allow up to 512 ...

C_B

13

asked Jul 22 at 13:48

0 votes

0 answers

17 views

Transformer Model Repeating Same Codon During Inference Despite High Training Accuracy

I'm working on a transformer-based model to translate amino acids to codons. During training and validation, my model achieves 95-98% accuracy. However, during inference, I encounter an issue where ...

Farshid B

1

asked Jul 21 at 11:06

-1 votes

0 answers

26 views

How to Estimate GPU Memory for training and inference, Data Requirements, and Training Time for Large Language Models?

This is a very concrete and well-defined computer engineering question. I don't understand why someone would want to close it. Today, I faced this question during an interview for an ML Engineer ...

maplemaple

1,435

asked Jul 20 at 7:32

0 votes

0 answers

14 views

How does the transformer model's attention mechanism deal with differing sequence lengths?

I am going through the architecture of the transformer and its attention mechanism. The thing I don't get about this mechanism is how it handles sequences of different lengths. For example: How does ...

Syed Mustaqhim

466

asked Jul 17 at 17:29

0 votes

0 answers

24 views

Using Sparse Categorical CrossEntropy, the loss becomes negative (tensorflow/keras)

I am doing the Tensorflow TF tutorial (https://www.tensorflow.org/text/tutorials/transformer) but with my own data. My data is not related to text, but it is sequences of tokens anyway, with a start ...

Dr Sokoban

1,638

asked Jul 17 at 10:40

0 votes

0 answers

25 views

“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS”

I’ve built a vanilla Transformer using PyTorch for machine translation and am encountering issues while trying to train it on an Apple Mac M3 with a 12-core CPU and an 18-core GPU (18GB RAM) ...

Pratheesh Kumar

1

asked Jul 13 at 6:23

1 vote

0 answers

22 views

Text summarizations of comments and replace the duplicates with the first occurrence if the meaning is comment is same

Context - Doing an NLP project to analyze comments column in a data frame. I want to replace the duplicates with the first occurrence if the meaning of the comments are same. I wants to compare all ...

Bhuvaneshwari D Raman Effect

11

asked Jul 12 at 3:18

1 vote

0 answers

21 views

Why am I seeing unused parameters in position embeddings when using relative_key in BertModel?

I am training a BERT model using pytorch and HuggingFace's BertModel. The sequences of tokens can vary in length from 1 (just a CLS token) to 128. The model trains fine when using absolute position ...

NW_liftoff

11

asked Jul 10 at 0:10

0 votes

0 answers

39 views

Positional encoding for explicitly timestamped transformer input data

I understand the concept of positional encoding for e.g text data when embedding a sequence for input to a transformer model - the transformer's attention mechanism processes the entire sequence at ...

TheSuperLemming

1

asked Jul 8 at 16:48

0 votes

0 answers

79 views

How to fix this error: KeyError: 'model.embed_tokens.weight'

This is the detailed error: Traceback (most recent call last): File "/home/cyq/zxc/SmartEdit/train/DS_MLLMSD11_train.py", line 769, in <module> train() File "/home/cyq/zxc/...

hshsh

11

asked Jul 6 at 19:41

0 votes

1 answer

30 views

Pytorch, use loss that don't return gradient

I'm trying to develop a model that improves the quality of a given audio. For this task I use DAC for the latent space and I run a transformer model to change the value of the latent space to improve ...

Jourdelune

139

asked Jul 5 at 17:05

0 votes

0 answers

17 views

How do I train a transformer for pointwise inference of time series data?

My dataset is composed of different particle trajectories over time [x(t), y(t)]. Each trajectory has a different length from another and my goal is to perform pointwise regression, i.e. estimate a ...

Little Wing

1

asked Jul 4 at 14:43

Collectives™ on Stack Overflow

Questions tagged [transformer-model]

How to visualize attention for long sequences (e.g., amino acids of length 1000) in Transformer models?

Keras Transformer regression model not predicting values beyond a threshold

How to Track Attention Weights of a Transformer Model with Comet?

Transformer models for contextual word embedding in large datasets

Transformer Model Repeating Same Codon During Inference Despite High Training Accuracy

How to Estimate GPU Memory for training and inference, Data Requirements, and Training Time for Large Language Models?

How does the transformer model's attention mechanism deal with differing sequence lengths?

Using Sparse Categorical CrossEntropy, the loss becomes negative (tensorflow/keras)

“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS”

Text summarizations of comments and replace the duplicates with the first occurrence if the meaning is comment is same

Why am I seeing unused parameters in position embeddings when using relative_key in BertModel?

Positional encoding for explicitly timestamped transformer input data

How to fix this error: KeyError: 'model.embed_tokens.weight'

Pytorch, use loss that don't return gradient

How do I train a transformer for pointwise inference of time series data?

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [transformer-model]

Related Tags