Optimize into_component for float_to_uint, u8 to f32/f64 #184

okaneco · 2020-04-26T03:48:39Z

Optimize component conversion of f32/f64 to uint.
Add IntoComponent implementation for u8 to f32/f64.

The rounding mode for components from floating point types to unsigned integers now rounds half to nearest even.
Converting a NaN to uint now returns uint::MAX, previously it returned 0.
Uint to float conversion behavior is not changed.

okaneco · 2020-04-26T03:50:47Z

I recently picked up Hacker's Delight, a book with some programming arithmetic tricks and algorithm implementations. On page 377, there's a table with conversions between IEEE floats/doubles and uint32/64s. I implemented the conversions for double to uint64, round to nearest even; float to uint32, round to nearest even; uint32 to float; and uint32 to double. There was a slight speedup for u8 to f32, and a considerable speedup for f32 to u8.

Rgb family/rgb_u8 to linsrgb_f32
                        time:   [8.0802 us 8.1011 us 8.1223 us]
                        thrpt:  [17.237 Melem/s 17.282 Melem/s 17.326 Melem/s]
                 change:
                        time:   [-6.0041% -5.6794% -5.3362%] (p = 0.00 < 0.05)
                        thrpt:  [+5.6370% +6.0214% +6.3876%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Rgb family/linsrgb_f32 to rgb_u8
                        time:   [9.0137 us 9.0414 us 9.0701 us]
                        thrpt:  [15.435 Melem/s 15.484 Melem/s 15.532 Melem/s]
                 change:
                        time:   [-42.937% -42.707% -42.467%] (p = 0.00 < 0.05)
                        thrpt:  [+73.812% +74.542% +75.246%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

The formulas don't work for the entire range of floats/doubles, but I believe they cover everything needed by palette. The code sacrifices clarity for performance/magic but I've added comments to the relevant sections.

Ogeon · 2020-04-26T08:57:30Z

Interesting. Luckily we are working with unsigned values within the ranges, so no loss in that sense. What would the result be if a float value happens to be out of range?

okaneco · 2020-04-26T14:12:08Z

In the current implementation, it would panic on a value less than some small, negative fractional number and return 0 for everything above the limit. Instead, we can use wrapping_sub or overflowing_sub which don't seem to have any substantial impact on performance. I originally wrote it with overflowing_sub which passes back a bool on whether overflow occurred, I don't remember why I changed it to a regular subtraction.

Here's a playground link with the regular and wrapping subtraction, and their results.

12845056 - Regular sub: 0
340282350000000000000000000000000000000 - Regular sub: 0
-340282350000000000000000000000000000000 - Regular sub: 0
NaN - Regular sub: 0
inf - Regular sub: 0
-inf - Regular sub: 0
-0.001 - Wrapping sub: 255
12845056 - Wrapping sub: 0
340282350000000000000000000000000000000 - Wrapping sub: 0
-340282350000000000000000000000000000000 - Wrapping sub: 0
NaN - Wrapping sub: 0
inf - Wrapping sub: 0
-inf - Wrapping sub: 0

I know components can be unclamped so this may be an issue.

okaneco · 2020-04-26T14:32:08Z

For example, here's a program straight from the docs.

use palette::convert::FromColorUnclamped;
use palette::{Lch, Srgb};

fn main() {
    let rgb = Srgb::from_color_unclamped(Lch::new(50.0, 100.0, -175.0));
    dbg!(rgb);
    dbg!(rgb.into_format::<u8>());
}

which implicitly converts an unclamped to a clamped value:

[src\main.rs:6] rgb = Rgb {
    red: -3.2585966353532334,
    green: 0.5920742313913836,
    blue: 0.5165314331478271,
    standard: PhantomData,
}
[src\main.rs:7] rgb.into_format::<u8>() = Rgb {
    red: 0,
    green: 151,
    blue: 132,
    standard: PhantomData,
}

Using wrapping subtraction, of course we don't get the same results:

[src\main.rs:6] rgb = Rgb {
    red: -3.2585966353532334,
    green: 0.5920742313913836,
    blue: 0.5165314331478271,
    standard: PhantomData,
}
[src\main.rs:7] rgb.into_format::<u8>() = Rgb {
    red: 130,
    green: 151,
    blue: 132,
    standard: PhantomData,
}

Should converting from an unclamped value to another component format require an explicit clamping to a valid color?

Ogeon · 2020-04-26T14:50:52Z

Hmm, I would like to avoid making it too surprising. I guess it's generally expected to clamp, or rather saturate the bits, when converting to unsigned ints. Float to float can still preserve the value as good as possible. I would think it's the more useful of the behaviors, as it's much more predictable and also follows the analogy of light intensity. Otherwise I'm also afraid people would still have to clamp or check themselves to be sure.

I made a modified version that combines clamping to <= 255 and saturating_sub: playground link. I have no idea what the performance impact is, but it seem to have the desired behavior. Including for the extra values I added.

fn into_component_saturate(n: f32) -> u8 {
    let max = u8::MAX as f32;
    let scaled = (n * max).min(max);
    let f = scaled + f32::from_bits(C23);
    f.to_bits().saturating_sub(C23) as u8
}

Removing the clamping to max makes it behave more like with wrapping.

I'm afraid we may have to sacrifice a bit of performance to get the behavior right, but it may still be better than the current state or the alternatives.

okaneco · 2020-04-26T15:03:17Z

Yes, saturating is what I would expect. I wasn't aware of that function.

I figured I might have to add the clamping back but there was some design involved to make the right choice, thanks for that. Only a slight regression from the unclamped version I first pushed.

Rgb family/linsrgb_f32 to rgb_u8
                        time:   [9.4328 us 9.4483 us 9.4638 us]
                        thrpt:  [14.793 Melem/s 14.817 Melem/s 14.842 Melem/s]
                 change:
                        time:   [+3.5184% +3.9171% +4.2800%] (p = 0.00 < 0.05)
                        thrpt:  [-4.1043% -3.7695% -3.3988%]
                        Performance has regressed.

Compared to master, still a big improvement.

Rgb family/linsrgb_f32 to rgb_u8
                        time:   [9.5660 us 9.5841 us 9.6032 us]
                        thrpt:  [14.579 Melem/s 14.608 Melem/s 14.635 Melem/s]
                 change:
                        time:   [-39.454% -39.247% -39.048%] (p = 0.00 < 0.05)
                        thrpt:  [+64.064% +64.600% +65.164%]
                        Performance has improved.

And we get proper clamping

[src\main.rs:6] rgb = Rgb {
    red: -3.2585966353532334,
    green: 0.5920742313913836,
    blue: 0.5165314331478271,
    standard: PhantomData,
}
[src\main.rs:7] rgb.into_format::<u8>() = Rgb {
    red: 0,
    green: 151,
    blue: 132,
    standard: PhantomData,
}

palette/src/component.rs

Ogeon · 2020-04-26T15:34:25Z

Yes, it's still a great improvement! Well done! Would you mind adding a few tests that compares to what round gives? Just to be on the safe side, as usual. I'm especially thinking cases like below 0, above 1, 0.5, and 0.99 (or something close to 1 that should become the max value, to make sure #117 doesn't come back), but the special ones like NaN and +/-Inf can also be added for good measure.

okaneco · 2020-04-26T16:43:30Z

Yes, I think it's a good idea to add more tests. As I was writing this I realized there wasn't much of a safety net for the special cases.

The new behavior doesn't have the same rounding due to the rounding mode of towards nearest even instead of just towards nearest. Here's a playground link of the tests I came up with.

The uint to float behavior looks good. Float to uint has small differences

Float to Uint conversion:

-800 - New: 0, Old: 0
-0.3 - New: 0, Old: 0
0 - New: 0, Old: 0
0.005 - New: 1, Old: 1
0.024983 - New: 6, Old: 6
0.15 - New: 38, Old: 38
0.3 - New: 76, Old: 77 /* difference */ 
0.5 - New: 128, Old: 128
0.6 - New: 153, Old: 153
0.7 - New: 178, Old: 179 /* difference */ 
0.8 - New: 204, Old: 204
0.8444 - New: 215, Old: 215
0.955 - New: 244, Old: 244
0.999 - New: 255, Old: 255
1 - New: 255, Old: 255
1.4 - New: 255, Old: 255
12845056 - New: 255, Old: 255
340282350000000000000000000000000000000 - New: 255, Old: 255
-340282350000000000000000000000000000000 - New: 0, Old: 0
NaN - New: 255, Old: 0 /* difference */ 
inf - New: 255, Old: 255
-inf - New: 0, Old: 0

The differences are 0.3 and 0.7. 0.3 * 255.0 is 76.5 and 0.7 * 255.0 is 178.5. The current rounding seems to always round up for tiebreakers. So any time the intermediate float is something ending with 0.5, the rounding might be different.

0.1 - New: 26, Old: 26
0.3 - New: 76, Old: 77
0.5 - New: 128, Old: 128
0.7 - New: 178, Old: 179
0.9 - New: 230, Old: 230

NaN is also a difference with the new method resulting in 255 and the old resulting in 0.

Ogeon · 2020-04-26T17:08:57Z

NaN doesn't make sense in either case, so I don't know if it matters. The rounding behavior is probably good enough too. May be surprising in a few cases, compared to school math, but I would bet it disappears in the noise when doing anything more advanced. Some seem to even recommend that behavior for floats.

okaneco · 2020-04-26T17:19:22Z

I thought the same thing about NaN. As I've learned more about this, I've come to find out that is the default rounding mode for IEEE floating points. A benefit is that it eliminates bias towards/away from 0 even though it distributes numbers towards even more than odd. So as you said, it may be surprising sometimes compared to school math but it's normal for the purpose of rounding IEEE floats.

okaneco · 2020-04-26T18:11:35Z

I added comments to notate the special cases as we've discussed here. I'm not sure how helpful they are, but I also added tests for the current expected results of rounding from f32/f64 to u8 (if not helpful now then for the future).

Ogeon · 2020-04-26T18:23:47Z

It's always good to make notes of decisions and reasoning for future adventurers. 🙂 Even if it's just us coming back to it.

I think this looks good! I don't have anything to add, so feel free to prepare for merge if you don't have anything more. 👍

okaneco · 2020-04-26T18:34:09Z

I think I should delete this section in the Packed docs about how 0.5f32 gets rounded to 0x80. That's still true but it's not true how it's done anymore. Before, that seemed necessary since rounding wasn't explained but now I'm not sure what to put here so I think it's better left out.

/// // The second assert is essentially how library components are converted
/// let float = 0.5f32 * 255.0;
/// assert_eq!(0x7F, float as u32);
/// assert_eq!(0x80, float.round() as u32);

Implement speedup for component conversion of f32/f64 to uint. Add `IntoComponent` implementation for u8 to f32/f64. Add tests for uint_to_float/double to check vs old conversion method Add tests for float/double to uint for expected results The rounding mode for float to uint is now round half to nearest even. Some results may be different than expected compared to before. Converting a NaN to uint now returns uint::MAX, previously it returned 0. Uint to float conversion behavior is effectively unchanged.

okaneco · 2020-04-26T18:42:24Z

This is ready to go if you're okay with the changes 👍

Ogeon · 2020-04-26T18:47:27Z

Yes, let's do this. Thank you!

bors r+

bors · 2020-04-26T19:09:27Z

Build succeeded:

continuous-integration/travis-ci/push

Ogeon reviewed Apr 26, 2020

View reviewed changes

palette/src/component.rs Outdated Show resolved Hide resolved

okaneco force-pushed the uint branch from 4360bd5 to 5402b90 Compare April 26, 2020 18:27

okaneco force-pushed the uint branch from 5402b90 to 69d3399 Compare April 26, 2020 18:37

bors bot merged commit 71b1434 into Ogeon:master Apr 26, 2020

okaneco deleted the uint branch April 26, 2020 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize into_component for float_to_uint, u8 to f32/f64 #184

Optimize into_component for float_to_uint, u8 to f32/f64 #184

okaneco commented Apr 26, 2020 •

edited

okaneco commented Apr 26, 2020 •

edited

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020

okaneco commented Apr 26, 2020

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020 •

edited

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020

okaneco commented Apr 26, 2020 •

edited

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020 •

edited

okaneco commented Apr 26, 2020

Ogeon commented Apr 26, 2020

bors bot commented Apr 26, 2020

Optimize into_component for float_to_uint, u8 to f32/f64 #184

Optimize into_component for float_to_uint, u8 to f32/f64 #184

Conversation

okaneco commented Apr 26, 2020 • edited

okaneco commented Apr 26, 2020 • edited

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020

okaneco commented Apr 26, 2020

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020 • edited

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020

okaneco commented Apr 26, 2020 • edited

Ogeon commented Apr 26, 2020

okaneco commented Apr 26, 2020 • edited

okaneco commented Apr 26, 2020

Ogeon commented Apr 26, 2020

bors bot commented Apr 26, 2020

okaneco commented Apr 26, 2020 •

edited

okaneco commented Apr 26, 2020 •

edited

okaneco commented Apr 26, 2020 •

edited

okaneco commented Apr 26, 2020 •

edited

okaneco commented Apr 26, 2020 •

edited