Python Regex: Greedy Pattern Returning Multiple Empty Matches
This pattern is meant simply to grab everything in a string up until the first potential sentence boundary in the data: [^\.?!\r\n]* Output: >>> pattern = re.compile(r'([
Solution 1:
The *
quantifier allows the pattern to capture a substring of length zero. In your original code version (without the ^
anchor in front), the additional matches are:
- the zero-length string between the end of
hard
and the first!
- the zero-length string between the first and second
!
- the zero-length string between the second and third
!
- the zero-length string between the third
!
and the end of the text
You can slice/dice this further if you like here.
Adding that ^
anchor to the front now ensures that only a single substring can match the pattern, since the beginning of the input text occurs exactly once.
Post a Comment for "Python Regex: Greedy Pattern Returning Multiple Empty Matches"