1

Suppose I have this structure of strings. I have >100k.

'xxxxxx0AxxZZxxBBxxxxx1AxxxxxBB'            --Group1 is 1   
'xxxxxxxxx0AxxxxZZxxxxx1AxxxxxBBxxxx'           --Group1 is 1
'xxxxxx1AxxxxxBBxxxx0AxxxxZxxBBxxxxZZxxBBxxxxxxxx'  --Group1 is 0
'xxx0AxxxBBxxxZZxxxxBBxxxxxx1AxxxxZxxxxxBB'     --Group1 is 0

The task is: "Find the digit before the character "A" where the following holds: 1) the string "BB" exists somewhere after the "A". 2) Between those two sets of characters from 1) neither "A", "BB" nor "ZZ" exists".

I have had numerous attempts doing negative/positive assertions. E.g. (\d)A(?!.*ZZ).*BB however my problem is, that (?!.*ZZ) evaluates PAST the first occurence of 'BB' after the matched 'A'. This is evident in e.g. string four above, since (?!.*ZZ) will match, even though the 'ZZ' is AFTER the first occurence of 'BB'.

I do not know how to "bound" lookahead.

My best guess is (\d)A(?!.*ZZ).*BB however (?!.*ZZ) searches past the first occurence of 'BB' and thus yield unexpected results.

2

0