Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A stab at removing unsafes #109

Merged
merged 6 commits into from Jun 5, 2020
Merged

Conversation

troublescooter
Copy link
Collaborator

This pr is a work in progress to eliminate unsafes as far as possible.

@troublescooter
Copy link
Collaborator Author

Any remarks @boxtown ?

@boxtown
Copy link
Collaborator

boxtown commented May 13, 2020

Would it be possible to do some benchmarking on the updated functions? When I originally wrote this I was purposefully trying avoiding to avoid runtime bounds checking which I don't believe leads to UB (I'm assuming this is the reason for removing unsafes?) although at this point it may be the case that the Rust compiler is smart enough to compile away the bounds checks (I haven't really kept up with compiler improvements)

Side note, are the formatting changes from running cargo fmt? There's probably a potential QoL improvement opportunity here to just have the CI run formatting

@troublescooter
Copy link
Collaborator Author

In my opinion the best selling point for removing unsafes is not necessarily to remove UB, but to easily convince others that there's no UB. I couldn't find anything that would appear to be UB, but if somebody has to choose a library and there's one with lots of unsafe blocks the choice is either to be negligent of them or to manually check them, and code review doesn't scale very well.

Do you have a suggestion what an acceptable trade-off between performance and safety would be? My opinion is that a general purpose library should have no unsafe blocks, but I refrained from rewriting blocks that appeared to be deliberate and where I wasn't confident could be written safely without performance regressions. I've not yet benchmarked, but I'd be happy to do so if you give an idea what an acceptable benchmark would look like.

@troublescooter
Copy link
Collaborator Author

Btw: In order to do my refactor I changed the function ranks such that it now has no side effects. The tests did not capture any mutation of self, is this a mistake or deliberate? If so, can I safely change &mut self in the trait to &self?

Side note, are the formatting changes from running cargo fmt? There's probably a potential QoL improvement opportunity here to just have the CI run formatting

Formatting is one click away for me, I usually make it a habit to keep my diffs clean.

@vks
Copy link
Contributor

vks commented May 14, 2020

I think the use of unsafe should be justified by benchmarks, not the other way around. Looking at the changes, it does not look like they affect performance critical code. I think for some of the changes here the bound checks can be optimized away, but not for all of them.

Benchmarking would be nice to have (for example, employing criterion), but it seems as if it is currently not done for statrs. Personally, I think it makes sense to merge these changes and worry about benchmarking later.

Note that this PR also contains some unrelated changes that are raising the minimally supported Rust version. I think this is reasonable, and the MSRV is not documented for statrs anyway.

@boxtown
Copy link
Collaborator

boxtown commented May 21, 2020

Dis-regard the formatting comment I missed the part where you were updating the constructors to use the short-hand.

Btw: In order to do my refactor I changed the function ranks such that it now has no side effects. The tests did not capture any mutation of self, is this a mistake or deliberate? If so, can I safely change &mut self in the trait to &self?

The function used to do an in-place sort but it looks like it's no longer necessary from the changes in this PR. I'll try to do some benchmarking over the weekend to though so let's keep it in for now to avoid flip-flopping if there's an egregious drop in performance (probably unlikely) and we'll create a task to remove the mut from the signature.

Give me some time to do a little more thorough review of the PR (assuming you're mostly finished) otherwise should be good to merge

@troublescooter
Copy link
Collaborator Author

troublescooter commented May 21, 2020

My comment might not have been clear, removing mut was of course not necessary, so I put it back in for now in my personal tree. Probably best addressed with a different pr, as somebody might rely on the undocumented behaviour.

Edit: I'd forgotten what I'd done a couple of days back and responded off the cuff. This parragraph is non-sensical. Side-effects don't exist anymore, but mut is still in.

In order to alleviate your concerns I've begun benchmarking with criterion, so if you give me a day or two I'll see if I can get everything together.

Copy link
Collaborator Author

@troublescooter troublescooter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On nightly there's partition_at_index that could be a candidate for most of the code in select_inplace. I haven't yet tested, as to do this cleanly would probably involve newtyping [f64] to guarantee no NaNs in the slice, which might be a good idea regardless.

@troublescooter
Copy link
Collaborator Author

I was not able to determine whether my changes had any performance impact, because I ran into what looks like undefined behaviour.

Benchmarking order statistic/ranks: RankTieBreaker::First: Warming up for 3.0000 sthread 'main' panicked at 'index out of bounds: the len is 100 but the index
is 18446744073709551590', /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/libcore/slice/mod.rs:515:35
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:77
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:61
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1028
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1412
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:65
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:50
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:188
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:205
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:464
  11: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:373
  12: rust_begin_unwind
             at src/libstd/panicking.rs:302
  13: core::panicking::panic_fmt
             at src/libcore/panicking.rs:139
  14: core::panicking::panic_bounds_check
             at src/libcore/panicking.rs:96
  15: statrs::statistics::slice_statistics::quick_sort_all
[...]

Based on the size of the index being dereferenced, the only reasonable explanation seems to be that quick_sort_all is reading somewhere where it shouldn't.

@troublescooter
Copy link
Collaborator Author

There's one unsafe remaining in the computation of the factorial, which I want to refactor in the future. For now I'm finished.

@troublescooter
Copy link
Collaborator Author

At line :569 in src/statistics/slice_statistics.rs an overflowing subtraction occurs, after which an out of bounds read at the next Hoare loop iteration is performed. Later the overflow is caught by primary.swap(a,b) which panics.

@boxtown
Is there anything missing to have this merged?

Copy link
Collaborator

@boxtown boxtown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, great work thanks for doing this! Just left one comment, but it's not critical. We can squash and merge if you disagree

src/statistics/slice_statistics.rs Outdated Show resolved Hide resolved
@troublescooter
Copy link
Collaborator Author

Agree and fixed.

@boxtown boxtown merged commit 83aa6b5 into statrs-dev:master Jun 5, 2020
@boxtown
Copy link
Collaborator

boxtown commented Jun 5, 2020

Thanks again for doing this work, I'll try to update the changelog and cut a release some time this weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants