Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [transformer-model]

This tag refers to the Transformer model, especially used for Natural Language Understanding and Processing and made popular by the paper Attention Is All You Need.

0 votes
0 answers
6 views

How to visualize attention for long sequences (e.g., amino acids of length 1000) in Transformer models?

I am working with Transformer models and I have a specific use case where I need to visualize the attention mechanism for long sequences. Specifically, I am dealing with amino acid sequences of length ...
Farshid B's user avatar
0 votes
0 answers
14 views

Keras Transformer regression model not predicting values beyond a threshold

I am currently working on a keras transformer model for regression and I am getting prediction values which are cut off to some specific threshold. Code: def transformer_block(self,inputs, embed_dim, ...
srihari madhavan's user avatar
0 votes
0 answers
11 views

How to Track Attention Weights of a Transformer Model with Comet?

I am working on a Transformer model for a translation task and want to track attention weights using Comet. My model consists of 2 layers with 2 attention heads each. I am interested in understanding ...
Farshid B's user avatar
0 votes
0 answers
18 views

Transformer models for contextual word embedding in large datasets

I'm interested in using contextual word embeddings generated by a transformer-based model to explore the similarity of certain words in a large dataset. Most transformer models only allow up to 512 ...
C_B's user avatar
  • 13
0 votes
0 answers
17 views

Transformer Model Repeating Same Codon During Inference Despite High Training Accuracy

I'm working on a transformer-based model to translate amino acids to codons. During training and validation, my model achieves 95-98% accuracy. However, during inference, I encounter an issue where ...
Farshid B's user avatar
-1 votes
0 answers
26 views

How to Estimate GPU Memory for training and inference, Data Requirements, and Training Time for Large Language Models?

This is a very concrete and well-defined computer engineering question. I don't understand why someone would want to close it. Today, I faced this question during an interview for an ML Engineer ...
maplemaple's user avatar
  • 1,435
0 votes
0 answers
14 views

How does the transformer model's attention mechanism deal with differing sequence lengths?

I am going through the architecture of the transformer and its attention mechanism. The thing I don't get about this mechanism is how it handles sequences of different lengths. For example: How does ...
Syed Mustaqhim's user avatar
0 votes
0 answers
24 views

Using Sparse Categorical CrossEntropy, the loss becomes negative (tensorflow/keras)

I am doing the Tensorflow TF tutorial (https://www.tensorflow.org/text/tutorials/transformer) but with my own data. My data is not related to text, but it is sequences of tokens anyway, with a start ...
Dr Sokoban's user avatar
  • 1,638
0 votes
0 answers
25 views

“Bus Error and Resource Tracker Warning When Training PyTorch Model on GPU with MPS”

I’ve built a vanilla Transformer using PyTorch for machine translation and am encountering issues while trying to train it on an Apple Mac M3 with a 12-core CPU and an 18-core GPU (18GB RAM) ...
Pratheesh Kumar's user avatar
1 vote
0 answers
22 views

Text summarizations of comments and replace the duplicates with the first occurrence if the meaning is comment is same

Context - Doing an NLP project to analyze comments column in a data frame. I want to replace the duplicates with the first occurrence if the meaning of the comments are same. I wants to compare all ...
Bhuvaneshwari D Raman Effect's user avatar
1 vote
0 answers
21 views

Why am I seeing unused parameters in position embeddings when using relative_key in BertModel?

I am training a BERT model using pytorch and HuggingFace's BertModel. The sequences of tokens can vary in length from 1 (just a CLS token) to 128. The model trains fine when using absolute position ...
NW_liftoff's user avatar
0 votes
0 answers
39 views

Positional encoding for explicitly timestamped transformer input data

I understand the concept of positional encoding for e.g text data when embedding a sequence for input to a transformer model - the transformer's attention mechanism processes the entire sequence at ...
TheSuperLemming's user avatar
0 votes
0 answers
79 views

How to fix this error: KeyError: 'model.embed_tokens.weight'

This is the detailed error: Traceback (most recent call last): File "/home/cyq/zxc/SmartEdit/train/DS_MLLMSD11_train.py", line 769, in <module> train() File "/home/cyq/zxc/...
hshsh's user avatar
  • 11
0 votes
1 answer
30 views

Pytorch, use loss that don't return gradient

I'm trying to develop a model that improves the quality of a given audio. For this task I use DAC for the latent space and I run a transformer model to change the value of the latent space to improve ...
Jourdelune's user avatar
0 votes
0 answers
17 views

How do I train a transformer for pointwise inference of time series data?

My dataset is composed of different particle trajectories over time [x(t), y(t)]. Each trajectory has a different length from another and my goal is to perform pointwise regression, i.e. estimate a ...
Little Wing's user avatar

15 30 50 per page
1
2 3 4 5
79