Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a Schwartzian transform with custom sorting #6342

Merged
merged 7 commits into from Sep 2, 2017

Conversation

parkr
Copy link
Member

@parkr parkr commented Sep 1, 2017

Hey!

I was reading through the Enumerable#sort_by documentation and came across an interesting passage about how it gets its speed-up:

A more efficient technique is to cache the sort keys (modification times in this case)
before the sort. Perl users often call this approach a Schwartzian transform, after
Randal Schwartz. We construct a temporary array, where each element is an array
containing our sort key along with the filename. We sort this array, and then extract
the filename from the result.

This is exactly what sort_by does internally.

Whoa! A hidden gem: a recommendation about getting the most out of #sort by reducing object allocations. I'm in!

I tested it on the _docs collection in our jekyllrb.com site and it is quite impressive:

Yeah, ok, correctness all checks out for property "redirect_from"
Yeah, ok, correctness all checks out for property "title"
Warming up --------------------------------------
sort_by_property_directly with sparse property
                        11.000  i/100ms
schwartzian_transform with sparse property
                        76.000  i/100ms
Calculating -------------------------------------
sort_by_property_directly with sparse property
                        114.388  (± 8.7%) i/s -      1.144k in  10.087658s
schwartzian_transform with sparse property
                        824.113  (± 5.2%) i/s -      8.284k in  10.083133s

Comparison:
schwartzian_transform with sparse property:      824.1 i/s
sort_by_property_directly with sparse property:      114.4 i/s - 7.20x  slower

Warming up --------------------------------------
sort_by_property_directly with non-sparse property
                       932.000  i/100ms
schwartzian_transform with non-sparse property
                         1.772k i/100ms
Calculating -------------------------------------
sort_by_property_directly with non-sparse property
                          9.885k (± 6.3%) i/s -     98.792k in  10.040320s
schwartzian_transform with non-sparse property
                         17.436k (± 7.5%) i/s -    173.656k in  10.023797s

Comparison:
schwartzian_transform with non-sparse property:    17435.8 i/s
sort_by_property_directly with non-sparse property:     9885.2 i/s - 1.76x  slower

Based on these numbers, I switched Jekyll::Filters#sort_input to use this method.

🎉 🐎

@parkr parkr requested review from a team September 1, 2017 15:28
@ashmaroli
Copy link
Member

nice implementation Parker 👍 I did come across this transformation while reading up for another PR, but never bothered to bench an implementation..

If you have the time can you please benchmark using apples[0] and oranges[0] too?
Array[0] is a tad faster than Array.first

@parkr
Copy link
Member Author

parkr commented Sep 1, 2017

If you have the time can you please benchmark using apples[0] and oranges[0] too?
Array[0] is a tad faster than Array.first

Updated. I had to ensure I was copying the docs array for each new try (otherwise everything would be faster after the first iteration which sorted in place). Now showing much higher gains: 7x faster and 1.75x faster.

@ashmaroli
Copy link
Member

Now showing much higher gains: 7x faster and 1.75x faster

WOW!! Très cool!!!

@ashmaroli
Copy link
Member

@parkr IMO, this change should ideally be made into a public Utility method which can then be adapted for your suggestion to #5904

# lib/jekyll/utils.rb

def schwartzian_transform(input, property, order)
  [...]
end

@parkr
Copy link
Member Author

parkr commented Sep 1, 2017

this change should ideally be made into a public Utility method

I don't think we can do that, because each implementation will pull the property from input through different means each time, e.g. using input.data[property] or input[property] or item_property(input, property) in the filters.

apple_property <=> orange_property
input.map { |item| [item_property(item, property), item] }
.sort! do |apple_info, orange_info|
apple_property = apple_info.first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the benchmark results, Array#[] was faster than Array#first correct? Should we switch to that or did I interpret the benchmark results incorrectly?

@mattr-
Copy link
Member

mattr- commented Sep 2, 2017

What happens if we switch to Enumerable#sort_by directly? What's the reason that I'm missing that keeps us from doing:

input.sort_by { |item| item_property(item, property) }

@Crunch09
Copy link
Member

Crunch09 commented Sep 2, 2017

What happens if we switch to Enumerable#sort_by directly?

@mattr- : @parkr added a comment about this in the benchmark script, sort_by doesn't like nils

["foo", nil, "bar"].sort_by{|el| el}
# => ArgumentError: comparison of String with nil failed

Copy link
Member

@Crunch09 Crunch09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Really nice change!

@mattr-
Copy link
Member

mattr- commented Sep 2, 2017

@parkr added a comment about this in the benchmark script

Always good to know that I can't read very well. 🤣 Thanks @Crunch09!

@mattr-
Copy link
Member

mattr- commented Sep 2, 2017

@jekyllbot: merge +minor

@jekyllbot jekyllbot merged commit 6ce912e into master Sep 2, 2017
@jekyllbot jekyllbot deleted the schwartzian-transform branch September 2, 2017 14:38
jekyllbot added a commit that referenced this pull request Sep 2, 2017
@jekyll jekyll locked and limited conversation to collaborators Jul 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants