Skip to content Skip to sidebar Skip to footer

Rpy2 Pandas2ri.ri2py() Is Converting Na Values To Integers

I'm using Rpy2 version 2.8.4 in conjunction with R 3.3.0 and python 2.7.10 to create an R dataframe import rpy2.robjects as ro from rpy2.robjects import r from rpy2.robjects import

Solution 1:

The conversion of a column of factors in an R data.frame to a column in a pandas DataFrame is happening with that code. Nothing handling NAs in a specific way, so this must happen upstream of the conversion. If you look at your column "Col3" you'll see that NAs are already listed as levels in the factor.

>>>print(df.rx2("Col3"))
[1] 1  2  3  NA NA
Levels: 1 2 3 NA

This is even upstream of the creation of the R data.frame:

>>>lst = [1, 2, 3, ro.NA_Integer, ro.NA_Integer]>>>print(ro.vectors.FactorVector(lst))
[1] 1  2  3  NA NA
Levels: 1 2 3 NA

What is happening is that the constructor for FactorVector in rpy2 is using a different default for the parameter exclude than what is in R's factor() function (I think that it was made so to make the mapping between the integers work as index for the vector of levels by default).

R's default behaviour can be restored with:

>>>v = ro.vectors.FactorVector(lst, exclude=ro.StrVector(["NA"]))>>>print(v)
[1] 1    2    3    <NA> <NA>
Levels: 1 2 3

The issue here is that there are no guidelines for the representation of missing values (in the sense of an IEEE standard). R is using a arbitrary extreme value but Python does not have the notion of missing values.

Post a Comment for "Rpy2 Pandas2ri.ri2py() Is Converting Na Values To Integers"