Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort JSON-LD data by key #458

Merged
merged 3 commits into from
Feb 4, 2022
Merged

Sort JSON-LD data by key #458

merged 3 commits into from
Feb 4, 2022

Conversation

parkr
Copy link
Member

@parkr parkr commented Jan 31, 2022

Deterministic output is desired to improve diffability between builds.
If the output randomly changes order each time it's built, then every
page on a given site will change with every Jekyll build. Tools like
rsync and git which can diff files will always show a change even when
the content hasn't truly changed.

Fixes #362. Fixes #436.

@parkr parkr requested a review from a team January 31, 2022 04:33
Copy link
Member

@ashmaroli ashmaroli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend optimizing as much as possible to reduce impact on performance.

lib/jekyll-seo-tag/drop.rb Outdated Show resolved Hide resolved
spec/jekyll_seo_tag_integration_spec.rb Outdated Show resolved Hide resolved
@parkr
Copy link
Member Author

parkr commented Feb 1, 2022

@ashmaroli I figured we could make this a first-class thing and just change to_json to return a sorted object instead of adding a second method.

@parkr
Copy link
Member Author

parkr commented Feb 1, 2022

Ok I decided to measure the .sort_by(&:first).to_h I'm adding. It's not great:

Warming up --------------------------------------
            original    55.477k i/100ms
        with sort_by    30.221k i/100ms
Calculating -------------------------------------
            original    539.839k (± 8.8%) i/s -      2.718M in   5.078878s
        with sort_by    322.651k (± 2.8%) i/s -      1.632M in   5.062437s

I went up the chain and am taking advantage of the fact that Hashes preserve insertion order. So if you insert "@context" before "@type", then iterating over the Hash (and serializing it to JSON) will give you "@context" first, and "@type" second consistently. I overwrote the original Drop#to_h to sort the keys.

@parkr parkr force-pushed the jsonld-deterministic branch 2 times, most recently from eb943df to a3caeb4 Compare February 1, 2022 05:02
@ashmaroli
Copy link
Member

taking advantage of the fact that Hashes preserve insertion order.. overwrote the original Drop#to_h to sort the keys.

This is a wonderful idea! Good job! 👏

I think this is sufficient for an approval now, but definitely could be optimized further..

Essentially, the json_ld.to_json call currently consists of numerous iterations:

def to_json(state)
  keys.sort.each_with_object({}) do |(key, _), result|
    result[key] = self[key]
  end.compact.to_json(state)
end

👉 .sort.each_with_object.compact

I'll think about this till tomorrow. and shall register my approval then.
Will merge after Matt provides his inputs too.

Thank you very much @parkr

@parkr parkr requested a review from mattr- February 1, 2022 07:22
Deterministic output is desired to improve diffability between builds.
If the output randomly changes order each time it's built, then every
page on a given site will change with every Jekyll build. Tools like
rsync and git which can diff files will always show a change even when
the content hasn't truly changed.
@parkr
Copy link
Member Author

parkr commented Feb 1, 2022

@ashmaroli Great point! Further optimized to add an if <value> to the each_with_object. I think we need to keep keys.sort but a good optimization is to get rid of compact if not necessary!

@ashmaroli
Copy link
Member

ashmaroli commented Feb 1, 2022

👍

However, this PR may have exposed a bug in Jekyll Core.. Specifically:

keys.sort.each_with_object({}) do |(key, _), result|

From Jekyll Core codebase Drop#keys is a flat array:

      def keys
        (content_methods |
          mutations.keys |
          fallback_data.keys).flatten
      end

Why is the .each_with_object block destructuring the array into pairs?

keys.sort.each_with_object({}) do |(key, _), result|

instead of:

keys.sort.each_with_object({}) do |key, result|

UPDATE: Apparently, it's just an implementation bias.
array.each_with_object({}) do |(key, _), result| is functionally equivalent to array.each_with_object({}) do |key, result|

@parkr
Copy link
Member Author

parkr commented Feb 1, 2022

Yeah I'm not sure why we did that. I'm going to leave it as-is to match Jekyll Core.

Copy link
Member

@ashmaroli ashmaroli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final edit and this is good to go.

lib/jekyll-seo-tag/json_ld_drop.rb Outdated Show resolved Hide resolved
Co-authored-by: Ashwin Maroli <ashmaroli@users.noreply.github.com>
@ashmaroli
Copy link
Member

pinging @mattr- for 👀

@parkr
Copy link
Member Author

parkr commented Feb 3, 2022

@ashmaroli let's give Matt until the weekend. If he's not free to take a look by then, then I think your and my approval counts as 2 Jekyll maintainers' approval so we can merge and cut a release.

@ashmaroli
Copy link
Member

Thanks @parkr
@jekyllbot: merge +minor

@ashmaroli
Copy link
Member

@jekyllbot: merge +minor

@jekyllbot jekyllbot merged commit 54d37a8 into master Feb 4, 2022
@jekyllbot jekyllbot deleted the jsonld-deterministic branch February 4, 2022 16:17
jekyllbot added a commit that referenced this pull request Feb 4, 2022
@parkr
Copy link
Member Author

parkr commented Feb 4, 2022

Thanks @mattr-! Thanks @ashmaroli!

@jekyll jekyll locked and limited conversation to collaborators Feb 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make JSON-LD output deterministic Consistent order of keys for JSON-LD
4 participants