Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict function doesn't return the correct predictions with num_actors >1 #231

Open
faaany opened this issue Aug 24, 2022 · 6 comments
Open
Assignees

Comments

@faaany
Copy link

faaany commented Aug 24, 2022

Hi, when using the following code snippet to do xgboost training, I noticed that the results that the predict function returns are different when I set the number of actors to different values. In my case, I need to set the number of actors to 1 in the predict function in order to get the correct predictions.

   `ray.init() 
    cpus_per_actor = 15
    num_actors = 10
    ray_params = RayParams(num_actors=num_actors, cpus_per_actor=cpus_per_actor, elastic_training=True, max_failed_actors=1, max_actor_restarts=1)`


    
    dtrain = RayDMatrix(
                    train_path,
                    label=name,  
                    columns=feature_list[numlabel],
                    filetype=RayFileType.PARQUET)
    dvalid = RayDMatrix(
            valid_path,
            label=name, 
            columns=feature_list[numlabel],
            filetype=RayFileType.PARQUET)

    print("Training.....")
    model = train(xgb_parms, 
            dtrain,
            evals=[(dtrain,'train'),(dvalid,'valid')],
            num_boost_round=250,
            early_stopping_rounds=25,
            verbose_eval=25,
            ray_params=ray_params)
    
    model.save_model(f"{model_save_path}/xgboost_{name}_stage1.model")

    print('Predicting...')        
    dvalid = RayDMatrix(
                    valid_path,
                    label=name, 
                    columns=feature_list[numlabel],
                    filetype=RayFileType.PARQUET)
    
    oof[:, numlabel] = predict(model, dvalid,  ray_params=RayParams(num_actors=num_actors, cpus_per_actor=1))`

The returned predictions for num_actors=1:

[0.00197015 0.00656855 0.00210109 ... 0.00132486 0.00912175 0.03348438]

The returned predictions for num_actors=10:

[0.00253869 0.02829305 0.0060115 ... 0.00152305 0.01026866 0.03538961]

Is this a bug or am I setting the number of actors wrong? Thanks for your review!

@Yard1
Copy link
Member

Yard1 commented Aug 26, 2022

I tried running the following code locally:

from sklearn import datasets
from sklearn.model_selection import train_test_split

import numpy as np

from xgboost_ray import RayDMatrix, RayParams
from xgboost import XGBClassifier

from xgboost_ray.main import predict


# Load dataset
data, labels = datasets.load_breast_cancer(return_X_y=True)
# Split into train and test set
train_x, test_x, train_y, test_y = train_test_split(
    data, labels, test_size=0.25)

xgb = XGBClassifier()
xgb.fit(train_x, train_y)
pred = xgb.predict_proba(test_x)[:, 1]
print(pred)

pred_1 = predict(xgb.get_booster(), RayDMatrix(test_x), ray_params=RayParams(num_actors=1))
print(pred_1)

pred_8 = predict(xgb.get_booster(), RayDMatrix(test_x), ray_params=RayParams(num_actors=8))
print(pred_8)

assert np.allclose(pred, pred_1)
assert np.allclose(pred, pred_8)

and got the same results. will try in a distributed setting, and with the higgs dataset.

@Yard1
Copy link
Member

Yard1 commented Aug 26, 2022

I can reproduce this

@Yard1 Yard1 self-assigned this Aug 26, 2022
@Yard1
Copy link
Member

Yard1 commented Aug 26, 2022

As a workaround, you can either use the new Ray AIR API, or switch to sharding=RayShardingMode.BATCH in prediction RayDMatrix.

@faaany
Copy link
Author

faaany commented Aug 29, 2022

it works by adding sharding=RayShardingMode.BATCH to the prediction RayDMatrix. Close this issue.

@faaany faaany closed this as completed Aug 29, 2022
@faaany
Copy link
Author

faaany commented Aug 29, 2022

thanks!

@Yard1
Copy link
Member

Yard1 commented Aug 29, 2022

Let's keep this open as this is still a bug :)

@Yard1 Yard1 reopened this Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants