Rpy2 Pandas2ri.ri2py() Is Converting Na Values To Integers
Solution 1:
The conversion of a column of factors in an R data.frame
to a column in a pandas DataFrame
is happening with that code. Nothing handling NAs in a specific way, so this must happen upstream of the conversion. If you look at your column "Col3"
you'll see that NAs are already listed as levels in the factor.
>>>print(df.rx2("Col3"))
[1] 1 2 3 NA NA
Levels: 1 2 3 NA
This is even upstream of the creation of the R data.frame:
>>>lst = [1, 2, 3, ro.NA_Integer, ro.NA_Integer]>>>print(ro.vectors.FactorVector(lst))
[1] 1 2 3 NA NA
Levels: 1 2 3 NA
What is happening is that the constructor for FactorVector
in rpy2 is using a different default for the parameter exclude
than what is in R's factor()
function (I think that it was made so to make the mapping between the integers work as index for the vector of levels by default).
R's default behaviour can be restored with:
>>>v = ro.vectors.FactorVector(lst, exclude=ro.StrVector(["NA"]))>>>print(v)
[1] 1 2 3 <NA> <NA>
Levels: 1 2 3
The issue here is that there are no guidelines for the representation of missing values (in the sense of an IEEE standard). R is using a arbitrary extreme value but Python does not have the notion of missing values.
Post a Comment for "Rpy2 Pandas2ri.ri2py() Is Converting Na Values To Integers"