Skip to content Skip to sidebar Skip to footer

Python Regex: Greedy Pattern Returning Multiple Empty Matches

This pattern is meant simply to grab everything in a string up until the first potential sentence boundary in the data: [^\.?!\r\n]* Output: >>> pattern = re.compile(r'([

Solution 1:

The * quantifier allows the pattern to capture a substring of length zero. In your original code version (without the ^ anchor in front), the additional matches are:

  • the zero-length string between the end of hard and the first !
  • the zero-length string between the first and second !
  • the zero-length string between the second and third !
  • the zero-length string between the third ! and the end of the text

You can slice/dice this further if you like here.

Adding that ^ anchor to the front now ensures that only a single substring can match the pattern, since the beginning of the input text occurs exactly once.


Post a Comment for "Python Regex: Greedy Pattern Returning Multiple Empty Matches"