Questions tagged [transformer-model]
This tag refers to the Transformer model, especially used for Natural Language Understanding and Processing and made popular by the paper Attention Is All You Need.
transformer-model
1,177
questions
0
votes
0
answers
6
views
How to visualize attention for long sequences (e.g., amino acids of length 1000) in Transformer models?
I am working with Transformer models and I have a specific use case where I need to visualize the attention mechanism for long sequences. Specifically, I am dealing with amino acid sequences of length ...
0
votes
0
answers
14
views
Keras Transformer regression model not predicting values beyond a threshold
I am currently working on a keras transformer model for regression and I am getting prediction values which are cut off to some specific threshold.
Code:
def transformer_block(self,inputs, embed_dim, ...
0
votes
0
answers
11
views
How to Track Attention Weights of a Transformer Model with Comet?
I am working on a Transformer model for a translation task and want to track attention weights using Comet. My model consists of 2 layers with 2 attention heads each. I am interested in understanding ...
0
votes
0
answers
18
views
Transformer models for contextual word embedding in large datasets
I'm interested in using contextual word embeddings generated by a transformer-based model to explore the similarity of certain words in a large dataset.
Most transformer models only allow up to 512 ...
0
votes
0
answers
17
views
Transformer Model Repeating Same Codon During Inference Despite High Training Accuracy
I'm working on a transformer-based model to translate amino acids to codons. During training and validation, my model achieves 95-98% accuracy. However, during inference, I encounter an issue where ...
-1
votes
0
answers
26
views
How to Estimate GPU Memory for training and inference, Data Requirements, and Training Time for Large Language Models?
This is a very concrete and well-defined computer engineering question. I don't understand why someone would want to close it.
Today, I faced this question during an interview for an ML Engineer ...
0
votes
0
answers
14
views
How does the transformer model's attention mechanism deal with differing sequence lengths?
I am going through the architecture of the transformer and its attention mechanism. The thing I don't get about this mechanism is how it handles sequences of different lengths. For example:
How does ...
0
votes
0
answers
24
views
Using Sparse Categorical CrossEntropy, the loss becomes negative (tensorflow/keras)
I am doing the Tensorflow TF tutorial (https://www.tensorflow.org/text/tutorials/transformer) but with my own data. My data is not related to text, but it is sequences of tokens anyway, with a start ...
0
votes
0
answers
25
views
“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS”
I’ve built a vanilla Transformer using PyTorch for machine translation and am encountering issues while trying to train it on an Apple Mac M3 with a 12-core CPU and an 18-core GPU (18GB RAM) ...
1
vote
0
answers
22
views
Text summarizations of comments and replace the duplicates with the first occurrence if the meaning is comment is same
Context - Doing an NLP project to analyze comments column in a data frame. I want to replace the duplicates with the first occurrence if the meaning of the comments are same.
I wants to compare all ...
1
vote
0
answers
21
views
Why am I seeing unused parameters in position embeddings when using relative_key in BertModel?
I am training a BERT model using pytorch and HuggingFace's BertModel. The sequences of tokens can vary in length from 1 (just a CLS token) to 128. The model trains fine when using absolute position ...
0
votes
0
answers
39
views
Positional encoding for explicitly timestamped transformer input data
I understand the concept of positional encoding for e.g text data when embedding a sequence for input to a transformer model - the transformer's attention mechanism processes the entire sequence at ...
0
votes
0
answers
79
views
How to fix this error: KeyError: 'model.embed_tokens.weight'
This is the detailed error:
Traceback (most recent call last):
File "/home/cyq/zxc/SmartEdit/train/DS_MLLMSD11_train.py", line 769, in <module>
train()
File "/home/cyq/zxc/...
0
votes
1
answer
30
views
Pytorch, use loss that don't return gradient
I'm trying to develop a model that improves the quality of a given audio. For this task I use DAC for the latent space and I run a transformer model to change the value of the latent space to improve ...
0
votes
0
answers
17
views
How do I train a transformer for pointwise inference of time series data?
My dataset is composed of different particle trajectories over time [x(t), y(t)]. Each trajectory has a different length from another and my goal is to perform pointwise regression, i.e. estimate a ...