Welcome to MLink Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
444 views
in Technique[技术] by (71.8m points)

python - Regex Negative Lookbehind ignored

Here is my regex:

(?<!PAYROLL)(FIDELITY(?!.*TITLE)(?!.*NATION)|INVEST)(?!.*PAYROLL)

Here is my text

INCOMING WIRE TRUST GS INVESTMENT 
VANGUARD PAYROLL
PAYROLL FIDELITY
ACH CREDIT FIDELITY INVESTM-FIDELITY
ACH CREDIT FIDELITY INVESTM-FIDELITY
ACH DEBIT FIDELITY 
ACH DEBIT FIDELITY 
ACH CREDIT FIDELITY INVESTM-FIDELITY

When running this on http://regexr.com (using the PCRE RegEx Engine), it is matching on "PAYROLL FIDELITY", yet I'm specifying a negative lookbehind to not do that(?<!PAYROLL).

Any help appreciated.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The (?<!PAYROLL) negative lookbehind matches a location that is not immediately preceded with PAYROLL char sequence. In the PAYROLL FIDELITY string, the FIDELITY is not immediately preceded with PAYROLL, it is immediately preceded with PAYROLL + space.

You can solve the current problem in various ways. If you are sure there is always a single whitespace between words in the string (say, it is a tokenized string) add s after PAYROLL: (?<!PAYROLLs).

If there can be one or more whitespaces, the (?<!PAYROLLs+) pattern won't work in PCRE as PCRE lookbehind patterns must be of fixed width. You might match (some) exceptions and skip them using (*SKIP)(*FAIL) PCRE verbs:

PAYROLLs+FIDELITY(*SKIP)(*F)|(FIDELITY(?!.*TITLE)(?!.*NATION)|INVEST)(?!.*PAYROLL)

See the regex demo. You may even replace PAYROLLs+FIDELITY(*SKIP)(*F) with PAYROLL.*?FIDELITY(*SKIP)(*F) or PAYROLL[sS]+?FIDELITY(*SKIP)(*F) to skip any text chunk from PAYROLL till the leftmost FIDELITY. PAYROLLs+FIDELITY(*SKIP)(*F) matches PAYROLL, one or more whitespaces, FIDELITY and then fails the match triggering backtracking, and then the match is skipped and the next match is searched for starting from the index where the failure occurred.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to MLink Developer Q&A Community for programmer and developer-Open, Learning and Share
...