Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the performance using the hints by rubocop-performance and fasterer #33

Open
MarcoCostantini opened this issue May 12, 2020 · 2 comments

Comments

@MarcoCostantini
Copy link

MarcoCostantini commented May 12, 2020

rubocop-performance (see https://github.com/rubocop-hq/rubocop-performance ) says:

Offenses:

/usr/lib/ruby/2.7.0/rexml/attlistdecl.rb:45:7: C: Performance/InefficientHashSearch: Use #key? instead of #keys.include?.
      @pairs.keys.include? key
      ^^^^^^^^^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/doctype.rb:201:7: C: Performance/RegexpMatch: Use match? instead of =~ when MatchData is not used.
      quoted_string =~ /^[\'\"].*[\'\"]$/ ?
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/element.rb:255:43: C: Performance/RegexpMatch: Use match? instead of =~ when MatchData is not used.
        prefix = "xmlns:#{prefix}" unless prefix =~ /^xmlns:/
                                          ^^^^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/entity.rb:44:12: C: Performance/RegexpMatch: Use match? instead of =~ when MatchData is not used.
        if stream[2] =~ /SYSTEM|PUBLIC/
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/node.rb:56:9: C: Performance/RedundantBlockCall: Use yield instead of block.call.
        block.call(node)
        ^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/node.rb:65:24: C: Performance/RedundantBlockCall: Use yield instead of block.call.
        return node if block.call(node)
                       ^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/parseexception.rb:32:67: C: Performance/StringReplacement: Use tr instead of gsub.
        err << @source.buffer[0..80].force_encoding("ASCII-8BIT").gsub(/\n/, ' ')
                                                                  ^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/parsers/baseparser.rb:476:25: C: Performance/RegexpMatch: Use match? instead of =~ when MatchData is not used.
        return false if /\AUTF-16\z/i =~ xml_declaration_encoding
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/parsers/pullparser.rb:73:29: C: Performance/RegexpMatch: Use match? instead of =~ when MatchData is not used.
            event[2] unless event[2] =~ /PUBLIC|SYSTEM/
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/parsers/treeparser.rb:81:54: C: Performance/RegexpMatch: Use match? instead of =~ when MatchData is not used.
              entities[ event[1] ] = event[2] unless event[2] =~ /PUBLIC|SYSTEM/
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/parsers/xpathparser.rb:304:14: C: Performance/RegexpMatch: Use match? instead of !~ when MatchData is not used.
          if path !~ /^\s*\)/
             ^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/parsers/xpathparser.rb:539:41: C: Performance/RegexpMatch: Use match? instead of =~ when MatchData is not used.
        rest = LocationPath(rest, n) if rest =~ /\A[\/\.\@\[\w*]/
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
/usr/lib/ruby/2.7.0/rexml/quickpath.rb:108:49: C: Performance/RegexpMatch: Use match? instead of =~ when MatchData is not used.
      matches = filter( elements.dup, rest ) if axe_name =~ /-or-self$/u
                                                ^^^^^^^^^^^^^^^^^^^^^^^^

49 files inspected, 13 offenses detected


fasterer (see https://github.com/DamirSvrtan/fasterer) says:

validation/relaxng.rb:416 Calling argumentless methods within blocks is slower than using symbol to proc.
validation/relaxng.rb:523 Calling argumentless methods within blocks is slower than using symbol to proc.

parsers/xpathparser.rb:602 For loop is slower than using each.

parsers/baseparser.rb:381 For loop is slower than using each.

formatters/default.rb:70 Calling argumentless methods within blocks is slower than using symbol to proc.

xpath_parser.rb:105 For loop is slower than using each.
xpath_parser.rb:120 For loop is slower than using each.
xpath_parser.rb:458 Enumerable#sort is slower than Enumerable#sort_by.
xpath_parser.rb:590 Using each_with_index is slower than while loop.

quickpath.rb:88 Calling argumentless methods within blocks is slower than using symbol to proc.
quickpath.rb:133 Calling argumentless methods within blocks is slower than using symbol to proc.
quickpath.rb:135 Calling argumentless methods within blocks is slower than using symbol to proc.
quickpath.rb:138 Calling argumentless methods within blocks is slower than using symbol to proc.

node.rb:54 Calling blocks with call is slower than yielding.
node.rb:63 Calling blocks with call is slower than yielding.

functions.rb:42 Calling argumentless methods within blocks is slower than using symbol to proc.

element.rb:1068 Hash#fetch with second argument is slower than Hash#fetch with block.
element.rb:1075 Hash#fetch with second argument is slower than Hash#fetch with block.
element.rb:1128 Hash#fetch with second argument is slower than Hash#fetch with block.
element.rb:1212 Hash#fetch with second argument is slower than Hash#fetch with block.
element.rb:1246 Calling argumentless methods within blocks is slower than using symbol to proc.

doctype.rb:283 Use attr_reader for reading ivars.

49 files inspected, 22 offenses detected

@anew-bhav
Copy link

@nobu can I work on this issue.

The process I believe should be

  1. Setup project on local.
  2. Run tests to ensure the environment is ok.
  3. Run rubocop-performance .
  4. The above step will output offenses as shown in the screenshots above.
  5. solve each offense, by making the suggested changes
  6. run test again to check if the changes do not break anything.
  7. if the test fail for a particular change, ignore making a change and goto next step
  8. Goto 3

Ideally the above algo should also work for the offenses pointed out by fasterer.

Let me know if the process I suggested is OK. Please share any necessary feedback.

@kou
Copy link
Member

kou commented May 31, 2021

We don't merge a PR that says "improve performance" without practical benchmarks. For example, parsing a small XML, parsing a large XML, using a simple XPath and so on.

If you want to improve REXML performance, you need to create a benchmark for a target situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants