Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize for strings without multibyte characters #724

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sriedel
Copy link
Contributor

@sriedel sriedel commented Jun 16, 2019

Optimizes finding the character offset for strings that include no multibyte characters.

Note: I'm no expert in string encodings, but my naive assumption is if there are as many bytes in a string as there are characters, the requested character offset must be equal to the supplied byte offset. This assumption should hold for the majority of documentation written in english with UTF-8 encoding.

Motivation: generating ri documentation for the gem crack-0.4.3 took 156.3 seconds on my gen 6 i7 according to the rdoc output. Looking at the process with rubyspy, I saw that most of the time was being burned in RDoc::Markup::Parser#char_pos.

The output of rdoc with the original char_pos method:

~/.rvm/gems/ruby-2.6.3/gems/crack-0.4.3 $ time rdoc --ri
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
Parsing sources...
100% [22/22]  test/xml_test.rb

Generating RI format into /home/sr/.rdoc...

Files:      22

  Classes:     7 ( 6 undocumented)
  Modules:     2 ( 2 undocumented)
  Constants:   3 ( 2 undocumented)
  Attributes:  1 ( 1 undocumented)
  Methods:    11 (11 undocumented)

  Total:      24 (22 undocumented)
    8.33% documented

  Elapsed: 156.3s

 
real	2m36.989s
user	2m35.967s
sys	0m0.217s

With this change, the time to build ri documentation for the above mentioned gem is ~2.4 seconds:

~/.rvm/gems/ruby-2.6.3/gems/crack-0.4.3 $ time rdoc --ri 
Parsing sources...
100% [22/22]  test/xml_test.rb

Generating RI format into /home/sr/.rdoc...

  Files:      22

  Classes:     7 ( 6 undocumented)
  Modules:     2 ( 2 undocumented)
  Constants:   3 ( 2 undocumented)
  Attributes:  1 ( 1 undocumented)
  Methods:    11 (11 undocumented)

  Total:      24 (22 undocumented)
    8.33% documented

  Elapsed: 2.3s


real	0m2.798s
user	0m2.661s
sys	0m0.130s

Copy link
Member

@aycabta aycabta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this patch for crack-0.4.3 but I couldn't reproduce the performance...you should delete the output directory before re-run because RDoc uses a cache to generate new files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants