Pandas Read_csv Not Obeying A Regex Sep
Data: from io import StringIO import pandas as pd s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last 375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 0
Solution 1:
Let's look at this SO Post.
Use this regular expression, r',(?=\S)'
explained above.
from io import StringIO
import pandas as pd
s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00,ynot
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00,okkk
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00,abc
376918925,M,A78,Which ONE (select only one),A78E,Milk,2004-02-02 00:00:00,launch Wed., '''
df = pd.read_csv(StringIO(s), sep=r',(?=\S)')
Output:
ID Level QID Text \
375280046 S D3M Which is your favorite? D5M0 option 1
S D3M How often? (at home, at work, other) D3M0 Work
M A78 Do you prefer a, b, or c? A78C a
376918925 M A78 Which ONE (select only one) A78E Milk
ResponseID responseText date_key last
375280046 S 2012-08-08 00 0 0 ynot
S 2010-03-31 00 0 0 okkk
M 2010-03-31 00 0 0 abc
376918925 M 2004-02-02 00 0 0 launch Wed.,
Solution 2:
read_csv
appears to be stripping the space from the end of the string prior to attempting to identify the separator. This can be worked around by modifying the regex to also check for commas identified as just prior to the end of the file:
pd.read_csv(StringIO(s), sep=r',(?!\s|\Z)', engine='python')
Out[347]:
ID Level QID Text ResponseID \
0 375280046 S D3M Which is your favorite? D5M0
1 375280046 S D3M How often? (at home, at work, other) D3M0
2 375280046 M A78 Do you prefer a, b, or c? A78C
3 376918925 M A78 Which ONE (select only one) A78E
responseText date_key last
0 option 1 2012-08-08 00:00:00 ynot
1 Work 2010-03-31 00:00:00 okkk
2 a 2010-03-31 00:00:00 abc
3 Milk 2004-02-02 00:00:00 launch Wed.,
Post a Comment for "Pandas Read_csv Not Obeying A Regex Sep"