Python Regular Expression To Split Paragraphs
Solution 1:
Unfortunately there's no nice way to write "space but not a newline".
I think the best you can do is add some space with the x
modifier and try to factor out the ugliness a bit, but that's questionable: (?x) (?: [ \t\r\f\v]*? \n ){2} [ \t\r\f\v]*?
You could also try creating a subrule just for the character class and interpolating it three times.
Solution 2:
Are you trying to deduce the structure of a document in plain test? Are you doing what docutils does?
You might be able to simply use the Docutils parser rather than roll your own.
Solution 3:
Not a regexp but really elegant:
from itertools import groupby
defparagraph(lines) :
for group_separator, line_iteration in groupby(lines.splitlines(True), key = str.isspace) :
ifnot group_separator :
yield''.join(line_iteration)
for p in paragraph('p1\n\t\np2\t\n\tstill p2\t \n \n\tp'):
printrepr(p)
'p1\n''p2\t\n\tstill p2\t \n''\tp3'
It's up to you to strip the output as you need it of course.
Inspired from the famous "Python Cookbook" ;-)
Solution 4:
Almost the same, but using non-greedy quantifiers and taking advantage of the whitespace sequence.
\s*?\n\s*?\n\s*?
Solution 5:
FYI: I just wrote 2 solutions for this type of problem in another thread. First using regular expressions as requested here, and second using a state machine approach which streams through the input one line at a time:
Post a Comment for "Python Regular Expression To Split Paragraphs"