Questions tagged [text-mining]
Text Mining is a process of deriving high-quality information from unstructured (textual) information.
text-mining
2,599
questions
0
votes
1
answer
33
views
Extract Keywords from Text Vector -- one set of keyworks for each element
Please consider the reprex at the end of the post.
It works along the lines of
https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html
It extracts a set ...
0
votes
0
answers
23
views
Errors attaching metadata to corpus
I am trying to generate a corpus with two documents: one is responses of participants characterized as "supporters" and one is responses of "non-supporters". I've entered this as ...
0
votes
1
answer
69
views
Unordered txt file contents: How to design in proper dictionary
I have txt file and it's contents are unordered like below sample. I must select first row because it has train run exact time.
my txt file has couple of summary 1, 2 and so on. hence, keys are same ...
0
votes
0
answers
31
views
pdftools – How to skip errors?
I have an R script that converts all pdf files to text, but the "pdftools" package runs into various errors and stops the process. I would like to include in the code that if it finds an ...
1
vote
1
answer
36
views
Extracting Text via Web Scraping: Loop with several optional start/ end strings
I would like to webscrape the text of several press statements.
The problem I'm, currently having is, to define several strings, where the scraping of the text should start/ end. For example the ...
0
votes
1
answer
49
views
Export txt files from a corpus after preprocessing
I am struggling to export files from my corpus after preprocessing, I currently have 26 documents in my corpus, but i want to export them as txt files os they have been pre processed so i can combine ...
1
vote
1
answer
33
views
I cannot get past data(stop_words) to analyze text in text mining
It's my first attempt at text mining and I have run into a wall. This is what I have done thus far:
library(tm)
library(tidytext)
library(dplyr)
library(ggplot2)
text1 <- c("Dear land of ...
0
votes
0
answers
34
views
Preventing Automatic Fine-Tuning during Inference Loop in Python
I'm working on a Python project that involves processing documents through a language model within a for loop.
Basically, I have some questions and I want to ask these questions to a LLM that will ...
0
votes
0
answers
18
views
NER features in ML Text Mining
I'm doing a work in identifying fraudulent reviews, with this I'm using some feature engineering like 'NER'.
My question is, how can I fit NER into my ML algorithm? Can I vectorize it using TDF-IDF?
...
0
votes
0
answers
35
views
I can't use unnest tokens properly when importing from excel
I'm a brand new r programmer and I'm doing an unguided assignment trying to start text mining / sentiment analysis. I'm supposed to get text from an excel file (looks like this)
I do some filtering ...
0
votes
0
answers
24
views
Disambiguate a gene symbol from an English word
Dears,
I use pubmed.mineR: Text Mining of PubMed Abstracts, to extract gene symbols from PubMed Abstacts (texts).
There are some gene symbols like:
can
(https://www.uniprot.org/uniprotkb/P61517/entry)...
0
votes
0
answers
29
views
Python code to list all the tables created and tables used to create it from sql script
Hi, just wanted to know if you can output such a way that, Created tables and tables used to create that tables can be grouped together
eg
'select * from A inner join B on A.id=B.id
...
0
votes
0
answers
25
views
R package syuzhet does not work in Hungarian
I would like to get sentiments of Hungarian songs. I use syuzhet 1.07. It works fine with default settings or with some languages, but not with Hungarian. Is this a bug, or I should load other ...
-1
votes
1
answer
20
views
Error while creating the TDM - "No applicable method for 'meta' applied to an object of class "character""
While creating the tm package TermDocumentMatrix, i am getting error. following code i have used.
int_vc <- VCorpus(int_vc)
int_vc <- tm_map(int_vc, tolower)
int_vc <- tm_map(int_vc, ...
0
votes
0
answers
97
views
LDA Topic Modeling Producing Identical/Empty Topics
I am topic modeling on two large text documents (around 500-750 KB) and am asking for ten topics. I keep getting a repeat of two topics. Could this be an issue of the small number of documents? Or ...