Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations from hack in document_fragment #2087

Merged

Conversation

ashmaroli
Copy link
Contributor

What problem is this PR intended to solve?

Reduce object allocations from a hack in the constructor of Nokogiri::HTML::DocumentFragment.

Nokogiri 1.11 requires at least Ruby 2.4.0. Therefore, we can safely use Regexp#match? instead of String#=~.
The advantages are:

  • Regexp#match? doesn't allocate MatchData objects.
  • Modifying the regex itself allows not using String#strip which duplicates the given string.

Have you included adequate test coverage?

This change should be covered by existing tests.

Does this change affect the C or the Java implementations?

No, it doesn't.

@codeclimate
Copy link

codeclimate bot commented Sep 28, 2020

Code Climate has analyzed commit 6f81566 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (80% is the threshold).

This pull request will bring the total coverage in the repository to 94.3% (0.0% change).

View more on Code Climate.

@AppVeyorBot
Copy link

@flavorjones
Copy link
Member

@ashmaroli Hi! Thanks for submitting this.

Do you have any metrics around the reduced number of allocations? I'm curious how you performed your analysis.

@flavorjones
Copy link
Member

I'll need to look into why the jruby gem tests failed -- I don't think it's related to this PR.

@@ -33,7 +33,7 @@ def initialize document, tags = nil, ctx = nil
self.errors = document.errors - preexisting_errors
else
# This is a horrible hack, but I don't care
if tags.strip =~ /^<body/i
if /^\s*?<body/i.match?(tags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me why you've inserted \s*? at the front of this? That's a functional change that I wouldn't expect based on your description; and there aren't any tests provided demonstrating why this is introduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code tags.strip =~ /^<body/i removes whitespace around string tags and tests if tags starts with <body.
So I thought instead of allocating a sanitized version of tags and testing, it'd be better if test was to check if <body substring exists preceded by an optional amount of whitespace from the start of string.

In other words, to do away with tags.strip...

@ashmaroli
Copy link
Contributor Author

ashmaroli commented Sep 28, 2020

I'm curious how you performed your analysis

I used memory_profiler gem to profile Nokogiri parse and serialize an HTML source, via GitHub Actions on my fork.
I didn't open a pull request with those because I checked in the *.gemspec into version control for easy access to latest nokogiri gem..

@ashmaroli
Copy link
Contributor Author

I'll need to look into why the jruby gem tests failed

The reason behind this is rubocop-ast-0.7.0 requiring strscan (native-extension gem).
A v0.7.1 has been shipped without that requirement. So, restarting just the JRuby pipeline should now pass..

Base automatically changed from master to main January 17, 2021 21:53
This was referenced Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants