Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Power10 trait, with first method is_power_of_ten #15

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mohrezaei
Copy link

@mohrezaei mohrezaei commented Nov 15, 2018

Here is the first method in the new trait. Just sending it across for a quick review before I start adding more.

Do you prefer

  • one pull request, one commit, all methods
  • one pull request, one commit per method
  • one pull request per method?

@mohrezaei
Copy link
Author

mohrezaei commented Nov 15, 2018

I think I need help with fixing the u128 case. I can't seem to hide the static array from the compiler with a cfg attribute.

I don't understand why it works with 1.20, which doesn't have u128, but not older...

@mohrezaei
Copy link
Author

Looks like I managed to hide u128 😏 If there is a better way, let me know.

@mohrezaei
Copy link
Author

FYI, this is making things a bit difficult.

@cuviper
Copy link
Member

cuviper commented Nov 16, 2018

I prefer one PR with all planned methods, so we can treat it as ready-to-publish once merged, without further breaking changes. Adding more methods on a trait is breaking unless there's a default impl.

I don't care how many commits are in your PR, though I do prefer a clean working progression over seeing useless "fix it" kind of commits.

I don't understand why it works with 1.20, which doesn't have u128, but not older...

I believe the reason is that 1.20 has unstable support for u128, so it at least knows how to parse the big literals, where earlier versions didn't. Hiding it in a macro allowed those earlier versions to discard it without trying to tokenize the contents. An alternative workaround would be to use a const expression of smaller literals, like 10000000000 * 10000000000.

Copy link
Member

@cuviper cuviper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run cargo fmt -- it's going to expand a lot of your one-liners, but I want consistent formatting.

src/power10.rs Outdated
if v >= 100 { return v == 100; }
if v >= 10 { return v == 10; }
if v >= 1 { return v == 1; }
false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these last two lines could be collapsed to just v == 1.
(but LLVM might figure that out anyway...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yub.

src/power10.rs Outdated
fn is_pow10_u64(v: u64) -> bool {
let mut hash: u32 = v as u32;
hash ^= hash >> 3;
hash = hash.rotate_right(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since rust-lang/rust#56009 has attributed your performance regression to rotate_right, I wonder why do you even need that here? That 1 MSB won't contribute to the actual hash index, so can this be a plain shr instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after further analyzing the issue, the better code is:

    let mut hash: u32 = v as u32;
    hash ^= hash >> 3;
    hash = (hash >> 24) ^ (hash >> 1);

src/power10.rs Outdated
#[inline]
pub fn is_pow10_u128(v: $T) -> bool {
let mut hash: u32 = v as u32 | (((v as u64) >> 32) as u32);
hash = hash.wrapping_mul(1249991743).rotate_right(25);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, I think those 25 MSBs won't matter, so shr should be fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yub.

@cuviper
Copy link
Member

cuviper commented Nov 16, 2018

Hmm, actually it looks like I haven't done a crate-wide fmt here before, so that would be a lot of noise in this PR. I need to get to that later...

src/power10.rs Outdated
x.is_power_of_ten()
}

// implementation note: the is_power_of_ten algorithm is based on a perfect hash setup with very simple hash functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you use any particular tool to create these perfect hash functions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some code that does that. Nothing clever. It just picks random values and produces a hash until it finds something.

@mohrezaei
Copy link
Author

So I need a little help/sounding board here. After looking more closely at the performance, I have two different 64-bit implementations (perfect_hash, lz) that behave like so:

RUSTFLAGS=""
perfect_hash: 9000 ns/iter.
lz: 7300 ns/iter.

RUSTFLAGS="-C target-cpu=native"
perfect_hash: 4100 ns/iter
lz: 5000 ns/iter.

I think perfect_hash vectorizes better. I can't make up my mind about which one is the right one for a general purpose library ☹️ help?

@cuviper
Copy link
Member

cuviper commented Nov 16, 2018

Those numbers look close enough that I'd lean toward whichever will be easier to maintain.

The vectorization of your microbenchmark may or may not represent how well it can be optimized in real use, where you're more likely to have other computations involved too. So try not to extrapolate too much from these results.

@mohrezaei
Copy link
Author

I agree that vectorization is not an important consideration. When I turn off vectorization, lz benefits from native, but perfect_hash does not. I was also worried about 32bit platforms, but a quick check shows that lz with target-cpu=native is better, but worse without native at least on i686.

On balance, I think I'll go with lz for u64. Gotta run the numbers for u32.

Thanks.

@mohrezaei
Copy link
Author

mohrezaei commented Nov 19, 2018

This should be a good basis for a full review. I did a cargo fmt just on the new files. The arrays became quite long, but I left them as fmt wanted. I'll squash again after review edits.

A few things:

  • There is a bit of a naming issue right now: Ten vs 10. Power10, floor_log10 vs. next_power_of_ten, is_power_of_ten
  • I modeled the wrapping/checked/next_power_of_ten the same way as std, but floor_log10 just returns 0 for 0 (no alternate methods).
  • All methods take &self. That makes a lot of sense for a trait that may be used for things other than primitives. std uses self for similar methods, which I feel is problematic for a Power2 trait down the road. I don't intend to implement Power2.

@mohrezaei
Copy link
Author

@cuviper ping, just in case this got lost in the holiday shuffle 😄

@cuviper
Copy link
Member

cuviper commented Dec 1, 2018

Hi! This is still in my inbox to review, but I'll at least thrown opinions at your design questions. :)

  • There is a bit of a naming issue right now: Ten vs 10. Power10, floor_log10 vs. next_power_of_ten, is_power_of_ten

I think the method names are OK. The floats give a precedent for log10, and integer power_of_two gives a precedent for _ten, so we're on par with std inconsistency here. I could accept either way on the trait name, Power10 or PowerTen.

  • I modeled the wrapping/checked/next_power_of_ten the same way as std, but floor_log10 just returns 0 for 0 (no alternate methods).

I would expect 0 to be a panic case, because log(0) in any base is undefined.

You can also consider that log(1) is exactly 0, and log is strictly increasing, so floor_log(1) should be the least input which returns 0 -- leaving us with no answer for floor_log(0).

Aside on naming - I'm inclined to just call this log10 instead of floor_log10. I feel the flooring aspect is implicit when we're talking about unsigned integers, just like division.

  • All methods take &self. That makes a lot of sense for a trait that may be used for things other than primitives. std uses self for similar methods, which I feel is problematic for a Power2 trait down the road. I don't intend to implement Power2.

&self is appropriate, and I don't think it's a problem for Power2. Everything we would implement through std methods are Copy types, so they can just dereference to get self by value.

@mohrezaei
Copy link
Author

Renamed floor_log10 to log10 and added checked/unchecked variants (wrapping didn't make sense?).
Also made sure tests pass in release mode.

BTW, interesting curiosity: 0f64.log10().floor() == 0 (would've made a good excuse for the method name and zero behavior 😅 )

@cuviper
Copy link
Member

cuviper commented Dec 7, 2018

BTW, interesting curiosity: 0f64.log10().floor() == 0

I get -inf: playground

And even that only makes from the positive side, lim{x->0+} log(x)

@mohrezaei
Copy link
Author

Right, sorry, this got me confused:
0f64.log10().floor() as u32 == 0
playground

@strake
Copy link
Contributor

strake commented Dec 12, 2018

It seems we ought to rather have a general Power trait with a method is_power_of(self, other) -> bool which checks whether self is a power of other.

@cuviper
Copy link
Member

cuviper commented Dec 12, 2018

I suggested generic powers in #14 too, but I think it's still OK to specialize common bases.

src/power10.rs Outdated
fn is_power_of_ten(&self) -> bool;

/// Returns the base 10 logarithm value, truncated down.
/// Returns zero for zero in release mode, panics in debug mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather make zero an unconditional panic -- it's a domain error, just like divide-by-zero.

fn log10(&self) -> u32; //note: u32 return type to allow BigInt types to implement. 10^(2^32) is 4 billion digits

/// Returns the base 10 logarithm value, truncated down.
/// Panics for zero.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "checked" method usually implies returning an Option, and again checked_div is a good example.

src/power10.rs Outdated
fn checked_log10(&self) -> u32;

/// Returns the base 10 logarithm value, truncated down.
/// Returns zero for zero. (This matches f64::log10()::floor() for zero.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this one. It only matches f64 given that you got -inf as u32 == 0, but I think this is actually undefined behavior -- rust-lang/rust#10184

@mohrezaei
Copy link
Author

The last commit should address the latest review comments. Let me know if there is more to do (I'll squash the commits once the review is done).

Oh, I should clarify one thing: the f64 stuff was mostly a joke. It's so hard to get that through in writing (sigh... 😓)

@cuviper
Copy link
Member

cuviper commented Dec 14, 2018

The benchmarks are failing on nightly:

failures:

---- benchl10_u32_up_to_10000 stdout ----
thread 'benchl10_u32_up_to_10000' panicked at 'undefined value for log of zero', src/power10.rs:763:1
note: Run with `RUST_BACKTRACE=1` for a backtrace.

---- benchl10_u64_up_to_10000 stdout ----
thread 'benchl10_u64_up_to_10000' panicked at 'undefined value for log of zero', src/power10.rs:769:1

@mohrezaei
Copy link
Author

It was the zero panic. Pushed a fix. BTW, the always panic behavior has a 50% cost in performance for u32 (from 0.8 ns per call to 1.2 ns per call).

@cuviper
Copy link
Member

cuviper commented Dec 15, 2018

BTW, the always panic behavior has a 50% cost in performance for u32

Well, microbenchmarks are hard to extrapolate into real use cases, mixed with other computations. For instance, if the optimizer has enough range information from context to know the input can't be zero, then the check will probably be removed altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants