Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Reduce how much code is instantiated for comparisons #2365

Closed
wants to merge 5 commits into from

Conversation

Marwes
Copy link
Contributor

@Marwes Marwes commented Aug 8, 2022

Rationale for this change

The arrow crate generates a lot of code due the the dynamic comparison operators which takes a long time to compile and potentially bloats the final binary. By extracting code from generic functions and otherwise reducing how much code gets instantiated we get a faster and slimmer compilation.

Doesn't really fix #1858 as the amount of code is still huge, but it does still help.

(Output of cargo llvm-line -p arrow)

Before

  Lines           Copies        Function name
  -----           ------        -------------
  3923704 (100%)  78533 (100%)  (TOTAL)
   294241 (7.5%)   1163 (1.5%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
   127937 (3.3%)   2355 (3.0%)  core::option::Option<T>::map
   121895 (3.1%)   1750 (2.2%)  core::iter::traits::iterator::Iterator::fold
   107517 (2.7%)   1908 (2.4%)  core::iter::adapters::map::map_fold::{{closure}}
    93556 (2.4%)   1908 (2.4%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    92112 (2.3%)    912 (1.2%)  arrow::compute::kernels::comparison::cmp_dict
    85470 (2.2%)   1235 (1.6%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}}
    84899 (2.2%)   1163 (1.5%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter::{{closure}}
    78480 (2.0%)    912 (1.2%)  arrow::compute::kernels::comparison::cmp_dict::{{closure}}
    61814 (1.6%)    314 (0.4%)  arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
    61759 (1.6%)    301 (0.4%)  <arrow::array::array_primitive::PrimitiveArray<T> as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    61748 (1.6%)   1723 (2.2%)  core::iter::traits::iterator::Iterator::for_each
    59900 (1.5%)   1235 (1.6%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold
    57896 (1.5%)     62 (0.1%)  core::slice::sort::partition_in_blocks
    54089 (1.4%)    358 (0.5%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next
    50624 (1.3%)   3198 (4.1%)  core::iter::adapters::map::Map<I,F>::new
    48259 (1.2%)    320 (0.4%)  arrow::buffer::mutable::MutableBuffer::extend_from_iter
    45450 (1.2%)    150 (0.2%)  arrow::compute::kernels::comparison::compare_op
    43368 (1.1%)    312 (0.4%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::size_hint
    40331 (1.0%)   1717 (2.2%)  core::iter::traits::iterator::Iterator::for_each::call::{{closure}}
    40109 (1.0%)    145 (0.2%)  arrow::util::trusted_len::trusted_len_unzip
    39509 (1.0%)    312 (0.4%)  <arrow::buffer::immutable::Buffer as core::iter::traits::collect::FromIterator<T>>::from_iter
    36585 (0.9%)   3198 (4.1%)  core::iter::traits::iterator::Iterator::map

After

  Lines           Copies        Function name
  -----           ------        -------------
  3348856 (100%)  66644 (100%)  (TOTAL)
   126209 (3.8%)   2323 (3.5%)  core::option::Option<T>::map
   117648 (3.5%)    912 (1.4%)  arrow::compute::kernels::comparison::cmp_dict_bit_equal
    82631 (2.5%)   1174 (1.8%)  core::iter::traits::iterator::Iterator::fold
    76312 (2.3%)    587 (0.9%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    76221 (2.3%)   1332 (2.0%)  core::iter::adapters::map::map_fold::{{closure}}
    64660 (1.9%)   1332 (2.0%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    61814 (1.8%)    314 (0.5%)  arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
    61759 (1.8%)    301 (0.5%)  <arrow::array::array_primitive::PrimitiveArray<T> as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    57896 (1.7%)     62 (0.1%)  core::slice::sort::partition_in_blocks
    51593 (1.5%)    342 (0.5%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next
    48259 (1.4%)    320 (0.5%)  arrow::buffer::mutable::MutableBuffer::extend_from_iter
    46302 (1.4%)    659 (1.0%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}}
    45450 (1.4%)    150 (0.2%)  arrow::compute::kernels::comparison::compare_op
    42851 (1.3%)    587 (0.9%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter::{{closure}}
    41072 (1.2%)   2606 (3.9%)  core::iter::adapters::map::Map<I,F>::new
    41048 (1.2%)    296 (0.4%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::size_hint
    41012 (1.2%)   1147 (1.7%)  core::iter::traits::iterator::Iterator::for_each
    40109 (1.2%)    145 (0.2%)  arrow::util::trusted_len::trusted_len_unzip
    39509 (1.2%)    312 (0.5%)  <arrow::buffer::immutable::Buffer as core::iter::traits::collect::FromIterator<T>>::from_iter
    32252 (1.0%)    659 (1.0%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold
    31680 (0.9%)    240 (0.4%)  arrow::compute::kernels::arithmetic::math_op_dict
    30955 (0.9%)   1842 (2.8%)  core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
    30890 (0.9%)    152 (0.2%)  arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
    30132 (0.9%)     81 (0.1%)  arrow::compute::kernels::take::take_primitive
    29737 (0.9%)   2606 (3.9%)  core::iter::traits::iterator::Iterator::map
    29531 (0.9%)    241 (0.4%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    28824 (0.9%)     62 (0.1%)  core::slice::sort::shift_tail

What changes are included in this PR?

Refactorings to reduce how much code is instantiated. The PR is left in draft as some of the changes to compare native types instead of "arrow types" is a bit hacky.

Markus Westerlind added 5 commits August 8, 2022 15:48
…igned numbers

These are just bit-equality so we can cut out a lot of generated code by re-using these
…(-2.5%)

The main issue is still that the comparison operators force this to instantiate to much (587). But it helps a bit.
@Marwes Marwes changed the title Compiletimes full refactor: Reduce how much code is instantiated for comparisons Aug 8, 2022
@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 8, 2022
@@ -201,6 +213,45 @@ impl<'a, T: ArrowPrimitiveType> ArrayAccessor for &'a PrimitiveArray<T> {
}
}

pub(crate) struct NativeArrayView<'a, T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we perhaps find some way to eliminate this part, in a trade-off between more unsafe and getting LLVM to do more work, I would always take the latter...

@tustvold
Copy link
Contributor

tustvold commented Aug 8, 2022

Do you perhaps have some compile time metrics for this change?

@Marwes
Copy link
Contributor Author

Marwes commented Aug 9, 2022

Do you perhaps have some compile time metrics for this change?

time cargo build -p arrow (Probably very imprecise, but seems clear enough)

Before

________________________________________________________
Executed in   17,63 secs   fish           external
   usr time   59,63 secs  381,00 micros   59,63 secs
   sys time    3,78 secs  225,00 micros    3,78 secs

After

________________________________________________________
Executed in   14,01 secs   fish           external
   usr time   43,51 secs  540,00 micros   43,51 secs
   sys time    2,51 secs  270,00 micros    2,51 secs


}

// Trait to help reduce the number of function instantiations by merging types which compare the
// same (u8/i8 are compared the same way)
Copy link
Contributor

@tustvold tustvold Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct, as u8 is an unsigned comparison whereas i8 is signed? You can apply this style optimization for things like TimestampNanosecondArray, i.e. logical types, but I'm not sure you can discard the sign-ness of the type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True I were to hasty and only though about ==/!=. It needs to stay for ordered comparisons

@tustvold
Copy link
Contributor

I'm closing this PR as it has bit-rotted substantially, and there has been significant effort to address this under #2594. Please feel to reopen if/when updated

@tustvold tustvold closed this Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Huge amount of llvm code generated by comparison kernels, potentially slowing compile times
2 participants