Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize into_component for float_to_uint, u8 to f32/f64 #184

Merged
merged 1 commit into from Apr 26, 2020

Conversation

okaneco
Copy link
Contributor

@okaneco okaneco commented Apr 26, 2020

Optimize component conversion of f32/f64 to uint.
Add IntoComponent implementation for u8 to f32/f64.

The rounding mode for components from floating point types to unsigned integers now rounds half to nearest even.
Converting a NaN to uint now returns uint::MAX, previously it returned 0.
Uint to float conversion behavior is not changed.

@okaneco
Copy link
Contributor Author

okaneco commented Apr 26, 2020

I recently picked up Hacker's Delight, a book with some programming arithmetic tricks and algorithm implementations. On page 377, there's a table with conversions between IEEE floats/doubles and uint32/64s. I implemented the conversions for double to uint64, round to nearest even; float to uint32, round to nearest even; uint32 to float; and uint32 to double. There was a slight speedup for u8 to f32, and a considerable speedup for f32 to u8.

Rgb family/rgb_u8 to linsrgb_f32
                        time:   [8.0802 us 8.1011 us 8.1223 us]
                        thrpt:  [17.237 Melem/s 17.282 Melem/s 17.326 Melem/s]
                 change:
                        time:   [-6.0041% -5.6794% -5.3362%] (p = 0.00 < 0.05)
                        thrpt:  [+5.6370% +6.0214% +6.3876%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Rgb family/linsrgb_f32 to rgb_u8
                        time:   [9.0137 us 9.0414 us 9.0701 us]
                        thrpt:  [15.435 Melem/s 15.484 Melem/s 15.532 Melem/s]
                 change:
                        time:   [-42.937% -42.707% -42.467%] (p = 0.00 < 0.05)
                        thrpt:  [+73.812% +74.542% +75.246%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

The formulas don't work for the entire range of floats/doubles, but I believe they cover everything needed by palette. The code sacrifices clarity for performance/magic but I've added comments to the relevant sections.

@Ogeon
Copy link
Owner

Ogeon commented Apr 26, 2020

Interesting. Luckily we are working with unsigned values within the ranges, so no loss in that sense. What would the result be if a float value happens to be out of range?

@okaneco
Copy link
Contributor Author

okaneco commented Apr 26, 2020

In the current implementation, it would panic on a value less than some small, negative fractional number and return 0 for everything above the limit. Instead, we can use wrapping_sub or overflowing_sub which don't seem to have any substantial impact on performance. I originally wrote it with overflowing_sub which passes back a bool on whether overflow occurred, I don't remember why I changed it to a regular subtraction.

Here's a playground link with the regular and wrapping subtraction, and their results.

12845056 - Regular sub: 0
340282350000000000000000000000000000000 - Regular sub: 0
-340282350000000000000000000000000000000 - Regular sub: 0
NaN - Regular sub: 0
inf - Regular sub: 0
-inf - Regular sub: 0
-0.001 - Wrapping sub: 255
12845056 - Wrapping sub: 0
340282350000000000000000000000000000000 - Wrapping sub: 0
-340282350000000000000000000000000000000 - Wrapping sub: 0
NaN - Wrapping sub: 0
inf - Wrapping sub: 0
-inf - Wrapping sub: 0

I know components can be unclamped so this may be an issue.

@okaneco
Copy link
Contributor Author

okaneco commented Apr 26, 2020

For example, here's a program straight from the docs.

use palette::convert::FromColorUnclamped;
use palette::{Lch, Srgb};

fn main() {
    let rgb = Srgb::from_color_unclamped(Lch::new(50.0, 100.0, -175.0));
    dbg!(rgb);
    dbg!(rgb.into_format::<u8>());
}

which implicitly converts an unclamped to a clamped value:

[src\main.rs:6] rgb = Rgb {
    red: -3.2585966353532334,
    green: 0.5920742313913836,
    blue: 0.5165314331478271,
    standard: PhantomData,
}
[src\main.rs:7] rgb.into_format::<u8>() = Rgb {
    red: 0,
    green: 151,
    blue: 132,
    standard: PhantomData,
}

Using wrapping subtraction, of course we don't get the same results:

[src\main.rs:6] rgb = Rgb {
    red: -3.2585966353532334,
    green: 0.5920742313913836,
    blue: 0.5165314331478271,
    standard: PhantomData,
}
[src\main.rs:7] rgb.into_format::<u8>() = Rgb {
    red: 130,
    green: 151,
    blue: 132,
    standard: PhantomData,
}

Should converting from an unclamped value to another component format require an explicit clamping to a valid color?

@Ogeon
Copy link
Owner

Ogeon commented Apr 26, 2020

Hmm, I would like to avoid making it too surprising. I guess it's generally expected to clamp, or rather saturate the bits, when converting to unsigned ints. Float to float can still preserve the value as good as possible. I would think it's the more useful of the behaviors, as it's much more predictable and also follows the analogy of light intensity. Otherwise I'm also afraid people would still have to clamp or check themselves to be sure.

I made a modified version that combines clamping to <= 255 and saturating_sub: playground link. I have no idea what the performance impact is, but it seem to have the desired behavior. Including for the extra values I added.

fn into_component_saturate(n: f32) -> u8 {
    let max = u8::MAX as f32;
    let scaled = (n * max).min(max);
    let f = scaled + f32::from_bits(C23);
    f.to_bits().saturating_sub(C23) as u8
}

Removing the clamping to max makes it behave more like with wrapping.

I'm afraid we may have to sacrifice a bit of performance to get the behavior right, but it may still be better than the current state or the alternatives.

@okaneco
Copy link
Contributor Author

okaneco commented Apr 26, 2020

Yes, saturating is what I would expect. I wasn't aware of that function.

I figured I might have to add the clamping back but there was some design involved to make the right choice, thanks for that. Only a slight regression from the unclamped version I first pushed.

Rgb family/linsrgb_f32 to rgb_u8
                        time:   [9.4328 us 9.4483 us 9.4638 us]
                        thrpt:  [14.793 Melem/s 14.817 Melem/s 14.842 Melem/s]
                 change:
                        time:   [+3.5184% +3.9171% +4.2800%] (p = 0.00 < 0.05)
                        thrpt:  [-4.1043% -3.7695% -3.3988%]
                        Performance has regressed.

Compared to master, still a big improvement.

Rgb family/linsrgb_f32 to rgb_u8
                        time:   [9.5660 us 9.5841 us 9.6032 us]
                        thrpt:  [14.579 Melem/s 14.608 Melem/s 14.635 Melem/s]
                 change:
                        time:   [-39.454% -39.247% -39.048%] (p = 0.00 < 0.05)
                        thrpt:  [+64.064% +64.600% +65.164%]
                        Performance has improved.

And we get proper clamping

[src\main.rs:6] rgb = Rgb {
    red: -3.2585966353532334,
    green: 0.5920742313913836,
    blue: 0.5165314331478271,
    standard: PhantomData,
}
[src\main.rs:7] rgb.into_format::<u8>() = Rgb {
    red: 0,
    green: 151,
    blue: 132,
    standard: PhantomData,
}

palette/src/component.rs Outdated Show resolved Hide resolved
@Ogeon
Copy link
Owner

Ogeon commented Apr 26, 2020

Yes, it's still a great improvement! Well done! Would you mind adding a few tests that compares to what round gives? Just to be on the safe side, as usual. I'm especially thinking cases like below 0, above 1, 0.5, and 0.99 (or something close to 1 that should become the max value, to make sure #117 doesn't come back), but the special ones like NaN and +/-Inf can also be added for good measure.

@okaneco
Copy link
Contributor Author

okaneco commented Apr 26, 2020

Yes, I think it's a good idea to add more tests. As I was writing this I realized there wasn't much of a safety net for the special cases.

The new behavior doesn't have the same rounding due to the rounding mode of towards nearest even instead of just towards nearest. Here's a playground link of the tests I came up with.

The uint to float behavior looks good. Float to uint has small differences

Float to Uint conversion:

-800 - New: 0, Old: 0
-0.3 - New: 0, Old: 0
0 - New: 0, Old: 0
0.005 - New: 1, Old: 1
0.024983 - New: 6, Old: 6
0.15 - New: 38, Old: 38
0.3 - New: 76, Old: 77 /* difference */ 
0.5 - New: 128, Old: 128
0.6 - New: 153, Old: 153
0.7 - New: 178, Old: 179 /* difference */ 
0.8 - New: 204, Old: 204
0.8444 - New: 215, Old: 215
0.955 - New: 244, Old: 244
0.999 - New: 255, Old: 255
1 - New: 255, Old: 255
1.4 - New: 255, Old: 255
12845056 - New: 255, Old: 255
340282350000000000000000000000000000000 - New: 255, Old: 255
-340282350000000000000000000000000000000 - New: 0, Old: 0
NaN - New: 255, Old: 0 /* difference */ 
inf - New: 255, Old: 255
-inf - New: 0, Old: 0

The differences are 0.3 and 0.7. 0.3 * 255.0 is 76.5 and 0.7 * 255.0 is 178.5. The current rounding seems to always round up for tiebreakers. So any time the intermediate float is something ending with 0.5, the rounding might be different.

0.1 - New: 26, Old: 26
0.3 - New: 76, Old: 77
0.5 - New: 128, Old: 128
0.7 - New: 178, Old: 179
0.9 - New: 230, Old: 230

NaN is also a difference with the new method resulting in 255 and the old resulting in 0.

@Ogeon
Copy link
Owner

Ogeon commented Apr 26, 2020

NaN doesn't make sense in either case, so I don't know if it matters. The rounding behavior is probably good enough too. May be surprising in a few cases, compared to school math, but I would bet it disappears in the noise when doing anything more advanced. Some seem to even recommend that behavior for floats.

@okaneco
Copy link
Contributor Author

okaneco commented Apr 26, 2020

I thought the same thing about NaN. As I've learned more about this, I've come to find out that is the default rounding mode for IEEE floating points. A benefit is that it eliminates bias towards/away from 0 even though it distributes numbers towards even more than odd. So as you said, it may be surprising sometimes compared to school math but it's normal for the purpose of rounding IEEE floats.

@okaneco
Copy link
Contributor Author

okaneco commented Apr 26, 2020

I added comments to notate the special cases as we've discussed here. I'm not sure how helpful they are, but I also added tests for the current expected results of rounding from f32/f64 to u8 (if not helpful now then for the future).

@Ogeon
Copy link
Owner

Ogeon commented Apr 26, 2020

It's always good to make notes of decisions and reasoning for future adventurers. 🙂 Even if it's just us coming back to it.

I think this looks good! I don't have anything to add, so feel free to prepare for merge if you don't have anything more. 👍

@okaneco
Copy link
Contributor Author

okaneco commented Apr 26, 2020

I think I should delete this section in the Packed docs about how 0.5f32 gets rounded to 0x80. That's still true but it's not true how it's done anymore. Before, that seemed necessary since rounding wasn't explained but now I'm not sure what to put here so I think it's better left out.

/// // The second assert is essentially how library components are converted
/// let float = 0.5f32 * 255.0;
/// assert_eq!(0x7F, float as u32);
/// assert_eq!(0x80, float.round() as u32);

Implement speedup for component conversion of f32/f64 to uint.
Add `IntoComponent` implementation for u8 to f32/f64.
Add tests for uint_to_float/double to check vs old conversion method
Add tests for float/double to uint for expected results

The rounding mode for float to uint is now round half to nearest even.
Some results may be different than expected compared to before.
Converting a NaN to uint now returns uint::MAX, previously it returned
0.
Uint to float conversion behavior is effectively unchanged.
@okaneco
Copy link
Contributor Author

okaneco commented Apr 26, 2020

This is ready to go if you're okay with the changes 👍

@Ogeon
Copy link
Owner

Ogeon commented Apr 26, 2020

Yes, let's do this. Thank you!

bors r+

@bors
Copy link
Contributor

bors bot commented Apr 26, 2020

Build succeeded:

@bors bors bot merged commit 71b1434 into Ogeon:master Apr 26, 2020
@okaneco okaneco deleted the uint branch April 26, 2020 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants