Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

string-matching
0 votes
1 answer
19 views

Renaming dataframe column in Python with a string value in another dataframe by matching column/index names

Major edit: Apparently it is difficult to understand my question, so I'll do my best to concretize it. I got two dataframes, "df1" and "df2". These are quite larger, larger than in ...
Calle Flygare's user avatar
0 votes
1 answer
81 views

Finding the longest Dictionary.Key match in a phrase

I have a SortedDictionary<string, string>, ordered by key length descending, of the form: red fox - address1 weasel - address2 foxes - address3 fox - address3 etc. and a list of phrases e.g. &...
alexb's user avatar
  • 33
0 votes
2 answers
50 views

Is there a way to obtain a list separated by comma as the output of str_extract_all instead of the default output in R?

I have searched high and low and nobody seems to have asked that exact question, so I'm at loss. I have a data frame with a couple columns. One of this column contains various sentences that don't ...
Laue28's user avatar
  • 1
0 votes
1 answer
52 views

Identifying Correct String Order in Pandas

I have a dataframe as the following, showing the relationship of different entities in each row. Child Parent Ult_Parent Full_Family A032 A001 A039 A001, A032, A039, A040, A041, A043, A043, A045, ...
L H's user avatar
  • 27
0 votes
0 answers
36 views

Fuzzy Match 2 Large Pandas Dataframes

I have 2 pandas dataframes that both contain company names. I want to left join df1(~10k rows) with df2(~1.6m rows) on company names using a fuzzy match. My current function takes too long to run, so ...
L H's user avatar
  • 27
2 votes
6 answers
115 views

Matching the start of a sequence in R

I have a series of string in a vector and need to remove the matching starting pattern from the string. However, I don't know the pattern or how long it is. stringa <- c("apple_tart", &...
Katie Helm's user avatar
1 vote
1 answer
49 views

Given a String count the possible Permutations that satisfy a condition. How to Optimize from O(N*N!)

Hi I recently came across an interesting question and had a hard time trying to optimize it beyond O(N*N!). Here is the question: Given a string, return the number of possible combination that satisfy ...
Zi Ming's user avatar
  • 13
1 vote
2 answers
144 views

How can I find all exact occurrences of a string, or close matches of it, in a longer string in Python?

Goal: I'd like to find all exact occurrences of a string, or close matches of it, in a longer string in Python. I'd also like to know the location of these occurrences in the longer string. To define ...
Franck Dernoncourt's user avatar
1 vote
1 answer
31 views

Why doesn't fuzzywuzzy's process.extractBests give a 100% score when the tested string 100% contains the query string?

I'm testing fuzzywuzzy's process.extractBests() as follows: from fuzzywuzzy import process # Define the query string query = "Apple" # Define the list of choices choices = ["Apple&...
Franck Dernoncourt's user avatar
0 votes
0 answers
72 views

How to efficiently compute similarity scores for prefixes of a string with another string in C?

I'm working on a problem involving string matching where I need to compute the similarity scores for each prefix of a string C against another string S. The similarity score for a prefix P of C and S ...
NatsumiStar's user avatar
0 votes
0 answers
40 views

Spotfire's "~=" not matching wildcard characters

Using Spotfire Alanyst 14.0.3 I'm in the Data Canvas adding a filter via the "Add transformation" feature. When I use the filter expression ... [customdata_name]~='Binary Pump : 1 : ...
RightmireM's user avatar
  • 2,451
0 votes
1 answer
88 views

How to do fuzzy merge with 2 large pandas dataframes?

I have 2 pandas dataframes that both contain company names. I want to merge these 2 dataframes on company names using a fuzzy match. But the problem is 1 dataframe contains 5m rows and the other 1 ...
L H's user avatar
  • 27
1 vote
0 answers
114 views

How to find best matching anchor texts from paragraph and list of titles?

I have a paragraph: In today's world, keeping your personal information safe online is more important than ever. With cyber-attacks on the rise, having a strong cybersecurity strategy is essential. ...
Manoj Kamble's user avatar
2 votes
3 answers
142 views

How to Compare Hierarchy in 2 Pandas DataFrames? (New Sample Data Updated)

I have 2 dataframes that captured the hierarchy of the same dataset. Df1 is more complete compared to Df2, so I want to use Df1 as the standard to analyze if the hierarchy in Df2 is correct. However, ...
L H's user avatar
  • 27
-1 votes
1 answer
62 views

Can i combine contain and startswith in order to match two columns from one dataframe to another's master column?

Master dataframe filled with a specific match's players and statistics. 34 columns and variable number of rows. Column "Player" has full names Player Goals Assists Dominic Calvert-Lewin 1 ...
filipakous's user avatar

15 30 50 per page
1
2 3 4 5
154