All Questions
36,044
questions
0
votes
0
answers
11
views
Pyspark Regex Lookbehind Beginning Of String [duplicate]
My string in column "Key" is:
"+One+Two+Three-Four"
I want to extract all words following the "+" sign:
df.select(regexp_extract_all("Key", F.lit(r"(?<=...
1
vote
1
answer
21
views
Is there any situation where re.search could not be used instead of re.match? [duplicate]
The documentation seems clear but it begs the question, what is the purpose of re.match? Couldn't re.search with the caret (^) be used instead as long as the MULTILINE flag is not enabled? Is re.match ...
0
votes
0
answers
15
views
Regex unicode categories \p{L} combined with a Python formatted string [duplicate]
I would like to write a regular expression in Python that is a formatted string and includes a unicode category.
The regex would look like this:
import regex as re
mystring = "abc"
m = re....
1
vote
1
answer
51
views
Removing string between two specified strings in Python 3 [duplicate]
I am working on an NLP project that requires me to remove computer code from a piece of text. The code is encased between the tags <pre><code> and </code></pre>. Now I could do ...
-4
votes
0
answers
48
views
Why is using regex group feature on Python giving different outputs? [duplicate]
import re
string1 = "aaabaa"
zusuchen = "aa"
#1
m_start = re.finditer(fr'(?=({zusuchen}))', string1)
results = [(match.start(1), match.end(1)-1) for match in m_start]
for z in ...
-1
votes
0
answers
21
views
Splicing Relavent Text from a Screenshot using pytesseract and ocr for Scheduling script
Hi I'm currently making a script that can take screenshots of a university class schedule and automatically sync it to either google calender or outlook calendar.
from PIL import Image
import ...
1
vote
1
answer
46
views
How to extract the volume from a string using a regular expression?
I need to extract the volume with regular expression from strings like
"Candy BAR 350G" (volume = 350G),
"Gin Barrister 0.9ml" (volume = 0.9ml),
"BAXTER DRY Gin 40% 0.5 ml&...
1
vote
1
answer
40
views
How to use replace text using a regex in a Pandas dataframe [duplicate]
I have the following dataset:
meste = pd.DataFrame({'a':['06/33','40/2','05/22']})
a
0 06/33
1 40/2
2 05/22
And I want to remove the leading 0s in the text (06/33 to 6/33 for example). I ...
1
vote
1
answer
40
views
Regx pattern for Pyspark: match start and middle of a text and extract the middle
I have text in a pyspark column called TEXT that look like below:
The sky is red. I have 2 apples and I am fine.
----------------------------------------------
The sky is back. I have 8 apples or I am ...
3
votes
1
answer
60
views
Parsing formulas efficiently using regex and Polars
I am trying to parse a series of mathematical formulas and need to extract variable names efficiently using Polars in Python.
Regex support in Polars seems to be limited, particularly with look-around ...
1
vote
2
answers
60
views
How to extract or capture the value from stdout_lines of an Ansible playbook?
I am looking for help to extract or capture the free MB value from the stdout_lines of an Ansible playbook execution and use that value as a criteria to proceed further in the playbook.
My task output ...
2
votes
1
answer
52
views
How do I fix this Reg ex so that it matches hyphenated words where the final segment ends in a consonant other than the letter m
I want to match all cases where a hyphenated string (which could be made up of one or multiple hyphenated segments) ends in a consonant that is not the letter m.
In other words, it needs to match ...
-4
votes
0
answers
35
views
Using regex for Account Number Extraction [closed]
Using Regex, how to read the accounts from below table in such a manner that from the first row, four IDs can be extracted- 300501798101, 359073848101, 359073848102 and 300501798101 whereas from the ...
2
votes
1
answer
79
views
Using callable_iterator (re.finditer) causes Python to freeze
I have a function that is called for every line of a text.
def tokenize_line(line: str, cmd = ''):
matches = re.finditer(Patterns.SUPPORTED_TOKENS, line)
tokens_found, not_found, start_idx = []...
0
votes
0
answers
33
views
Regex: Find all matches between varying length sets of identical special characters [duplicate]
I have texts similar to this:
<FILE_NAME>
���������������
</FILE_NAME>
<SHEET_NAMES>
['������']
</SHEET_NAMES>
<RAW_STRINGS>
[������������]
Where any length of the ...