Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bst_ulong instead of size_t for CSR conversion #8369

Merged
merged 1 commit into from Oct 26, 2022

Conversation

yzhliu
Copy link
Member

@yzhliu yzhliu commented Oct 20, 2022

Some platform, e.g., WebAssembly, has strict typing checking. The mismatch will cause crash.

And they should be size_t to be consistent with https://github.com/dmlc/xgboost/blob/master/src/c_api/c_api.cc#L392 and https://github.com/dmlc/xgboost/blob/master/src/data/proxy_dmatrix.cc#L19

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix! But can we keep the use of bst_ulong? It's defined as uint64_t, which is more strict than size_t. There's c_bst_ulong in the _typing.py module.

@yzhliu
Copy link
Member Author

yzhliu commented Oct 20, 2022

@trivialfis I'm ok with changing python side to c_bst_ulong. But do we really want to do that? cuz all the downstreams are size_t, examples like https://github.com/dmlc/xgboost/blob/master/src/data/adapter.h#L352 and ArrayInterface.Shape() used by it. IMO using size_t would be more general that works for old 32bit machines and future 128bit machine?

not strong opinion though.

@trivialfis
Copy link
Member

I think we will have improved consistency (which is currently lacking for XGBoost) with other C API functions if we use uint64_t. Using uint64_t here to record the shape of the dataset doesn't have practical implications for different memory address sizes. We can optionally define bst_ulong as uint128_t if the future Skynet decides that having a single memory address space to index exabytes of memory is necessary. ;-)

A side note, c++ ptrdiff_t is usually defined as int64_t on 64-bit machines, as a result, that's the practical limit of memory indexing size.

@yzhliu yzhliu changed the title Use size_t instead of bst_ulong for CSR conversion in c_api Use bst_ulong instead of size_t for CSR conversion Oct 26, 2022
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

@trivialfis trivialfis merged commit 5699f60 into dmlc:master Oct 26, 2022
@trivialfis trivialfis added this to In progress in 1.7 Roadmap via automation Oct 26, 2022
trivialfis pushed a commit to trivialfis/xgboost that referenced this pull request Oct 26, 2022
trivialfis added a commit that referenced this pull request Oct 26, 2022
Co-authored-by: Yizhi Liu <liuyizhi@apache.org>
@trivialfis trivialfis moved this from In progress to Done in 1.7 Roadmap Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants