-1

I have a list of sentences, and a list of their ideal embeddings on a 25-dimensional vector. I am trying to use a neural network to generate new encodings, but I am struggling. While the model runs fine, its output makes no sense, and it doesn't even accurately replicate training data!

import numpy as np
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences


# Tokenization
tokenizer = Tokenizer()
tokenizer.fit_on_texts(sentence_list)
sequences = tokenizer.texts_to_sequences(sentence_list)


# Assuming your vectors are 25-dimensional
input_dim = 25

# Define encoder
input_vec = Input(shape=(max_sequence_length,))
encoded = Dense(25, activation='tanh')(input_vec)   # Example reduction to 16 dimensions
encoder = Model(input_vec, encoded)

# Define decoder
decoded = Dense(input_dim, activation='sigmoid')(encoded)
autoencoder = Model(input_vec, decoded)

# Compile model
autoencoder.compile(optimizer=Adam(), loss='mse')

# Train the model
autoencoder.fit(padded_sequences, combined_vectors_clean,
                epochs=10,
                batch_size=32,
                shuffle=True, validation_split= 0.2)

As far as I can tell, there's nothing wrong with my input and my labels, so what am I missing?

1 Answer 1

0

I think you can try to use another activation function. Embedding Vector may have negative values. But your model outputs is between 0 and 1.

Not the answer you're looking for? Browse other questions tagged or ask your own question.