Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] XGBoost Predict is not threadsafe #5339

Closed
kylejn27 opened this issue Feb 21, 2020 · 16 comments
Closed

[Discussion] XGBoost Predict is not threadsafe #5339

kylejn27 opened this issue Feb 21, 2020 · 16 comments

Comments

@kylejn27
Copy link
Contributor

kylejn27 commented Feb 21, 2020

I would like to start a formal discussion on potential fixes for the thread safety issue of the xgboost predict method. It is noted in a few places that the predict method is not thread safe [ in this issue and in the Python API and in the C++ API ]

A group of my colleagues at Capital One (@ahmadia, @gforsyth, @mmccarty, @stephenpardy) has come together and looked into a few possible suggested changes for this problem and would like to kick off a greater discussion on how best to improve user experience.

The best way to go about doing this would be to dive into the C++ layer and make the underlying methods thread safe. If anybody wishes to take this on, this would be the ideal change. However, this would be an intrusive set of code changes and a less intrusive just as effective change can be implemented at the Python layer.

We propose the following:

  • add a lock to the existing ctypes API call so that the predict method cannot be called in a multithreaded manner unintentionally. This prevents users from shooting themselves in the foot.
  • build a new, threadsafe predict method or object. This method or object will use threading.local memory to store a copy of the booster object. The primary booster object should be automatically copied into thread local memory upon fork.
@kylejn27
Copy link
Contributor Author

Just as a heads up, I'll be unresponsive for about a week but the others that I mentioned will be able to respond. I figured I'd kick off this discussion now though.

@trivialfis
Copy link
Member

trivialfis commented Feb 21, 2020

Personally I prefer changing the code in C++ to make it thread safe. To provide some information about the subject, the problem is in our prediction cache. If we can make this cache thread local then rest should be easy.

@trivialfis
Copy link
Member

trivialfis commented Feb 21, 2020

For discussion purpose, would you like to share some context about usage of thread safe prediction?

For example if you are using a thread management framework to handle incoming prediction queries, then there's no need for using a cache as once the prediction is done, the data is no longer useful for XGBoost. Actually in such cases it should be easy to make the prediction thread safe as every new input data has a dedicated cache.

But if you are doing incremental prediction, like first predict the first 4 trees then obtain the prediction from 8 trees to observe the difference, then the cache is still useful.

@RAMitchell
Copy link
Member

We will absolutely support contributions to fix this problem. Let us know what you propose in c++ layer and we can help work out a solution.

@trivialfis
Copy link
Member

trivialfis commented Feb 22, 2020

@RAMitchell Actually I spent 10 minutes or so hacked together a thread safe CPU prediction by removing the cache. GPU prediction should not be more difficult.

Here are the things that are not thread safe:

  • Configuration. (Solvable by using a double checked lock guard).
  • Prediction cache registering. Specifically the hash map. (Require discussion for interface).
  • Temporary storage for feature vector. (Use a small vector so we can make it a local variable).
  • Temporary storage for device tree nodes. (Use local variable, need performance check).

@sandys
Copy link

sandys commented Feb 23, 2020

i have an alternate suggestion - have you used the high performance onnxruntime ?
it is threadsafe by design on both CPU and GPU - microsoft/onnxruntime#114
Serving support exists for python/c++/c#/java, etc - https://github.com/microsoft/onnxruntime/tree/master/samples#python

you can convert an xgboost model to ONNX format officially here - https://github.com/onnx/onnxmltools/blob/master/tests/xgboost/test_xgboost_converters.py

@trivialfis
Copy link
Member

Sure. Onnx is a great format. PRs are welcomed. :-)

@trivialfis
Copy link
Member

trivialfis commented Feb 24, 2020

Will wait until DMatrix refactoring is done. I have a POC implementation for in-place prediction, which is thread safe.

@mmccarty
Copy link

@trivialfis Thank you for the quick replies. We would be happy to contribute to a solution once a clear path forward is determined. Here is a minimum example to reproduce the problem that will hopefully provide context.

import copy
import concurrent.futures

import numpy as np
from sklearn.datasets import make_classification

from xgboost import XGBClassifier

X, y = make_classification(n_samples=5000, n_informative=10)

# get baseline model and predictions
threaded_model = XGBClassifier(n_estimators=250, max_depth=5, n_jobs=2, nthread=2, seed=0)
threaded_model.fit(X, y)

original_predictions = threaded_model.predict_proba(X)[:, 1]

def predict(model, data):
    return model.predict_proba(data)[:, 1]

# with threaded model - no copy
n_threads = 10

with concurrent.futures.ThreadPoolExecutor() as executor:
    threads = {
        executor.submit(predict, threaded_model, X): i_thread for i_thread in range(n_threads)
    }

    total_differences = 0

    for future in concurrent.futures.as_completed(threads):
        i_thread = threads[future]
        result = future.result()
        diffs = np.sum(result != original_predictions)
        total_differences += diffs
        print(f"Thread {i_thread} has {diffs} differences")   

    assert total_differences == 0, f"{total_differences} differences across {n_threads} threads"

@kylejn27
Copy link
Contributor Author

kylejn27 commented Mar 5, 2020

While #5389 is in progress, ONNX is something that we're interested in. Is there any reason to not use it or anything that we should look out for, problems with it, etc?

@trivialfis
Copy link
Member

No. It's definitely an interesting option. I tested the regression model myself but that's all I know right now. Maybe in long term we can have a native exporter for it so we can have more testings done. But feel free to evaluate both options.

@sandys
Copy link

sandys commented Mar 6, 2020 via email

@trivialfis
Copy link
Member

Yup. That's great news. I need the inplace prediction to help accelerating grid searching in dask ml(and any other similar libraries). Let me know if there's anything I can help with the onnx model converter.

@trivialfis
Copy link
Member

Merged #5389

@kylejn27
Copy link
Contributor Author

@trivialfis thanks! I'm seeing this issue fixed on our end from your PR

@trivialfis
Copy link
Member

Happy to help here. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants