Optimize ord implementation and signed zero canonicalization #144

orlp · 2023-10-10T21:01:30Z

These micro-optimizations significantly reduce the number of instructions comparisons take, and often makes them branchless as well. Similarly we use a trick to canonicalize signed zero to positive zero in a single instruction without branches for faster hashing.

orlp · 2023-10-10T21:10:01Z

For example, a <= b went from this:

example::old_leq:
        vucomiss        xmm1, xmm0
        jae     .LBB0_1
        mov     al, 1
        vucomiss        xmm0, xmm1
        jae     .LBB0_5
        mov     al, -1
        vucomiss        xmm0, xmm0
        jp      .LBB0_4
.LBB0_5:
        inc     al
        cmp     al, 2
        setb    al
        ret
.LBB0_1:
        xor     eax, eax
        vucomiss        xmm0, xmm1
        sbb     eax, eax
        inc     al
        cmp     al, 2
        setb    al
        ret
.LBB0_4:
        vucomiss        xmm1, xmm1
        setnp   al
        inc     al
        cmp     al, 2
        setb    al
        ret

to this:

example::new_leq:
        vcmpleps        xmm0, xmm0, xmm1
        vxorps  xmm2, xmm2, xmm2
        vcmpunordps     xmm1, xmm1, xmm2
        vorps   xmm0, xmm1, xmm0
        vmovd   eax, xmm0
        and     al, 1
        ret

mbrubeck

Thank you!

orlp · 2023-10-10T21:57:40Z

@mbrubeck To also give some concrete numbers, on my Apple M1 machine sorting a shuffled Vec of 1 million OrderedFloat<f64>s went from 110ms to 84ms, an 1.3x speedup. I'd expect the difference on x86-64 to be even greater.

orlp added 4 commits October 10, 2023 22:40

Optimize ord implementation

46c9670

Add exhaustive infinite/finite/zero/nan combination test

deff8d0

Optimize signed zero canonicalization

06884ae

Silence clippy, fmt, and add missing inline directives

71770b3

mbrubeck approved these changes Oct 10, 2023

View reviewed changes

mbrubeck merged commit 4e29b08 into reem:master Oct 10, 2023
2 checks passed

timvisee mentioned this pull request Oct 16, 2023

Update ordered float to 4.1.1 qdrant/qdrant#2823

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ord implementation and signed zero canonicalization #144

Optimize ord implementation and signed zero canonicalization #144

orlp commented Oct 10, 2023 •

edited

orlp commented Oct 10, 2023

mbrubeck left a comment

orlp commented Oct 10, 2023 •

edited

Optimize ord implementation and signed zero canonicalization #144

Optimize ord implementation and signed zero canonicalization #144

Conversation

orlp commented Oct 10, 2023 • edited

orlp commented Oct 10, 2023

mbrubeck left a comment

Choose a reason for hiding this comment

orlp commented Oct 10, 2023 • edited

orlp commented Oct 10, 2023 •

edited

orlp commented Oct 10, 2023 •

edited