Why Xgboost.cv And Sklearn.cross_val_score Give Different Results?

August 21, 2024 Post a Comment

I'm trying to make a classifier on a data set. I first used XGBoost: import xgboost as xgb import pandas as pd import numpy as np train = pd.read_csv('train_users_processed_onehot

Solution 1:

This question is a bit old, but I ran into the problem today and figured out why the results given by xgboost.cv and sklearn.model_selection.cross_val_score are quite different.

By default cross_val_score use KFold or StratifiedKFold whose shuffle argument is False so the folds are not pulled randomly from the data.

So if you do this, then you should get the same results:

cross_val_score(estimator, X=train_features, y=train_labels, scoring="neg_log_loss",
    cv = StratifiedKFold(shuffle=True, random_state=23333))

Keep the random state in StratifiedKfold and seed in xgboost.cv same to get exactly reproducible results.

Python Channel

Why Xgboost.cv And Sklearn.cross_val_score Give Different Results?

Solution 1:

Post a Comment for "Why Xgboost.cv And Sklearn.cross_val_score Give Different Results?"