Extract Salaries From A List Of Strings
I'm trying to extract salaries from a list of strings. I'm using the regex findall() function but it's returning many empty strings as well as the salaries and this is causing me
Solution 1:
Using re.findall will give you the capturing groups when you use them in your pattern and you are using a group where almost everything is optional giving you the empty strings in the result.
In your pattern you use [0-9]*
which would match 0+ times a digit. If there is not limit to the leading digits, you might use [0-9]+
instead to not make it optional.
You might use this pattern with a capturing group:
(?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)
Explanation
(?<!\S)
Assert what is on the left is not a non whitespace character(
Capture group[0-9]+(?: [0-9]{1,3})?
match 1+ digits followed by an optional part that matches a space and 1-3 digits
)
Close capture group€
Match literally(?!\S)
Assert what is on the right is not a non whitespace character
Your code might look like:
import re
sal= '41 000€ à 63 000€ / an' #this is a sample string for which i have errors
regex = '(?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)'
print(re.findall(regex,sal)) # ['41 000', '63 000']
Post a Comment for "Extract Salaries From A List Of Strings"