Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [word-embedding]

For questions about word embedding, a language modelling technique in natural language processing. Questions can concern particular methods, such as Word2Vec, GloVe, FastText, etc, or word embeddings and their use in machine learning libraries in general.

word-embedding
-1 votes
0 answers
13 views

How can I use Word Embeddings for Sentiment Analysis?

I have a project where I've created a classifier but I've learned that word embeddings are a better approach. From my search, I found that CBOW and Skip-grams are the methods to use with Word2Vec. I ...
LoukasPap's user avatar
  • 1,350
0 votes
0 answers
18 views

Transformer models for contextual word embedding in large datasets

I'm interested in using contextual word embeddings generated by a transformer-based model to explore the similarity of certain words in a large dataset. Most transformer models only allow up to 512 ...
C_B's user avatar
  • 13
-1 votes
0 answers
16 views

How to Modify and Replace Embeddings in a Large Language Model (LLM)? [closed]

I am a beginner in large language models (LLMs) and I am working on a project. I have a question regarding embeddings in an LLM. How can I modify the embeddings of an LLM? Are they stored in a ...
Steven Thorn's user avatar
0 votes
0 answers
22 views

Is updating points in Qdrant vectordb without re-embedding the data safe?

I'm building a RAG chatbot using Langchain, using the data I've stored in a Qdrant vector database. I wanted to change the metadata of a few documents in my qdrant vector database. For this, I stored ...
Akshitha Rao's user avatar
0 votes
1 answer
37 views

How to get multimodal embeddings from CLIP model?

I'm hoping to use CLIP to get a single embedding for rows of multimodal (image and text) data. Say I have the following model: from PIL import Image import torch from transformers import CLIPProcessor,...
T_d's user avatar
  • 13
-1 votes
1 answer
22 views

Recreating Text Embeddings From An Example Dataset

I have a list of sentences, and a list of their ideal embeddings on a 25-dimensional vector. I am trying to use a neural network to generate new encodings, but I am struggling. While the model runs ...
slastine's user avatar
1 vote
0 answers
71 views

Is there a way to use CodeBERT to embed source code without natural language in input?

On CodeBERTS github they provide an example of using a NL-PL pair with the pretrained base model to create an embedding. I am looking to create an embedding using just source code which does not have ...
Armand Mousavi's user avatar
0 votes
0 answers
22 views

Small corpus, want to find associations. Word2Vec?

I'm a psychologist, and I'm diving into the field of AI. I could really use some help for a project. This semester, I discovered Word2Vec and was mesmerized by its capability to find associations. So, ...
Vinicius Fantini Marques Roja's user avatar
0 votes
0 answers
58 views

Retreive a Metadata from the Chroma DB vector Store

I want to build a LLM application using Langchain, Ollama, RAG and Streamlit. My problem is: In streamlit application, after uploading the PDF, it takes so much time to generate and deliver the answer....
Urvesh's user avatar
  • 358
0 votes
1 answer
17 views

Calculation of document word vector in python. Sum or average word2vec?

I have some questions about generating a dissimilarity matrix of a bunch of text documents using word vectors. Here I tokenise the text, remove OOV and then sum the word vectors of each word to use as ...
D. Zammit's user avatar
1 vote
3 answers
89 views

Getting GloVe embeddings using gensim, triu not found in scipy.linalg

I am trying to build a sentiment analysing model, using the GloVe word embeddings... I found multiple sources on how to import the embeddings into python, this one seemed to be the simplest... Trying ...
Mel7's user avatar
  • 19
0 votes
0 answers
82 views

Vector Embedding using Spark for compute

I have some large parquet files of data in Iceberg (which I have stored using Spark). My objective now is to pull these down using Spark, convert them into a spark dataframe, perform vector embedding ...
Mimis Chlympatsos's user avatar
0 votes
0 answers
57 views

Question about encode_multi_process method of the SentenceTransformer

How can I leverage the encode_multi_process method of the SentenceTransformer class to encode a large list of sentences using multiple GPUs? I tried using the encode_multi_process method of the ...
Alexis López's user avatar
0 votes
0 answers
19 views

Skip-Gram Model description in word2vec explanation article

In his article word2vec Parameter Learning Explained Xin Rong says (page 7): Each output is computed using the same hidden->output matrix: Looking into the word2vec source code I don’t see any “...
Damir Tenishev's user avatar
1 vote
1 answer
211 views

Cannot access embeddings endpoint on vLLM hosting llama3-8b-instruct

I'm using vllm to run llama3-8b-instruct on a machine, I can access the chat endpoint, but when I access the embedding endpoint using following code I get NotFoundError: Error code: 404 - {'detail': '...
Derrick Zhang's user avatar

15 30 50 per page
1
2 3 4 5
75