Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zenspider/warnings double loads and fixes #1962

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

zenspider
Copy link
Contributor

Fixes #1961

This PR fixes >300 warnings that are emitted while loading the code and/or running specs with warnings on.

Please consider reviewing this by commit.

Also consider turning off whitespace changes (for one commit on a rake task file). Tack on ?ws=1.

This is 12 commits, some should be squashed but I separated them because I didn't trust all my changes and this lets it be reviewable by the language owner.

Note: This completely removes use of (but not the definition of) Rouge.load_file and friends. They were not properly preventing double loads and eventually I decided it would be cleaner and easier if everything was just required via ruby to be more idiomatic and quell ~200 redefined method warnings. This does remove all of the "self-mutating" methods and eagerly loads the keyword files. It does not noticeably change load time at all. That might have been relevant when we all had slow spinning hard drives, but not anymore.

(no good way to set $VERBOSE from the outside)
Drops `rake check:specs VERBOSE=1` from 331 lines to 144.
load will load, always. Don't use it unless you really really mean it.
Most of these should just be removed instead of underscored... but I
want eyeballs on this first. Some aren't being used the way they think
they're being used.
Turns out it was one local variable used in a lot of regexps. This
updates the link to the reference and fixes it per reference.
…quire

Bundler shouldn't require everything. Slows down the world.
@zenspider
Copy link
Contributor Author

One warning is left for a regexp in julia that I am digging into but might best me:

lib/rouge/lexers/julia.rb:255: warning: character class has duplicated range: /[\p{L}\p{Nl}\p{S}_][\p{Word}\p{S}\p{Po}!]*/

I've figured out that it is it is the last half, and related to the ! being in Po, but when I reduce the regexp the warning goes away but when I put the fix back in the original it is still there... I'm a bit stumped.

@jneen
Copy link
Member

jneen commented May 17, 2023

Ya know, I've been against it in the past, but I wouldn't mind transitioning to require_relative. The current system comes from a time when ruby's require was painfully slow (and in my defense, require_relative wasn't standard in commonly used ruby versions), but I imagine that's been more or less fixed in the last 10 years.

@jneen
Copy link
Member

jneen commented May 17, 2023

That being said, I would not trust ruby's -w as some kind of northern star for "good code", especially when it comes to regular expressions. There are a large number of very odd design decisions in there that weren't very thought out IMO.

lib/rouge/lexers/xojo.rb Show resolved Hide resolved
Kernel::load File.join(Lexers::BASE_DIR, 'viml/keywords.rb')
self.keywords
end
require_relative "viml/keywords"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. The reason we did it this way was that these giant keyword files have a massive memory footprint, and we don't want Rouge to load them unless the lexer is actually in use. I think it'll still work if we use require_relative dynamically maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running the above benchmark code with /usr/bin/time -lph:

master:

            61194240  maximum resident set size
                3888  page reclaims
                   4  page faults
                  17  involuntary context switches
          6141564240  instructions retired
          1349472140  cycles elapsed
            55493952  peak memory footprint

mine:

            59604992  maximum resident set size
                3791  page reclaims
                   4  page faults
                  10  involuntary context switches
          6451778027  instructions retired
          1398372284  cycles elapsed
            51660096  peak memory footprint

(removed lines reporting 0)

I reran each 3 times and the numbers are all consistent...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib/rouge/lexers/syzlang.rb Show resolved Hide resolved
@@ -115,7 +115,7 @@ def self.keywords_type
rule %r/"/, Name::Variable, :double_string
rule %r/`/, Name::Variable, :backtick

rule %r/\w[\w\d]*/ do |m|
rule %r/\w+/ do |m|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a mistake and they meant either \p{L}\w+ or [a-zA-Z]\w+, would have to check the language spec for that.

lib/rouge/lexers/scala.rb Show resolved Hide resolved
@@ -73,7 +73,7 @@ def initialize(opts={})
end

state :export do
rule %r/[\w[\$]{1,2}{}()-]/, Name::Variable
rule %r/[\w\${}()-]/, Name::Variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really this PR's fault, but we usually ask for + at the end of single-char-class regexes like this for perf reasons. This one's old enough it might have been my fault 😅

Copy link
Contributor Author

@zenspider zenspider May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine adding the + ... but:

do you see how the {1,2} is also inside of a class, so it isn't a length specifier at all?... you might want to change the regexp to match more correctly. I haven't studied the make grammar in a really long time, but I'm pretty sure the above (either version) is incorrect.

Kernel::load File.join(Lexers::BASE_DIR, "llvm/keywords.rb")
types
end
require_relative "llvm/keywords"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to before, we'd like to avoid loading these large keywords files for users who aren't highlighting LLVM. Though now I look at this it appears to have been a copy paste job and it will absolutely load that file multiple times 🤦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code is wrong. It shouldn't use load. If that alone gets fixed, great... but seriously, The strategies you used 10 years ago to speed things up aren't the strategies you need today. Case in point:

$-w = nil

$: << "lib"
require "rouge"
keywords = Dir["lib/rouge/lexers/*/keywords.rb"]

iters = 30

t0 = Time.now
iters.times do
  keywords.each { |f| Kernel.load f }
end
t1 = Time.now

puts "Average time to load all keywords.rb: %8.5fs" % [(t1 - t0) / iters]

# Average time to load all keywords.rb:  0.00812s

It costs nearly nothing to load those keyword files... granted, every machine is different, but still. I suggest you measure and reevaluate whether self modifying code is worth it. I had thousands of lines of warnings when I was first evaluating whether to switch to rouge or not. Literally thousands. Those warnings have a cost too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to make rouge faster, don't load everything. Load (roughly) nothing. Push it towards a lazy loading system. But if you're going to load everything, let ruby do it and let it do it properly (and only once)

@zenspider
Copy link
Contributor Author

That being said, I would not trust ruby's -w as some kind of northern star for "good code", especially when it comes to regular expressions. There are a large number of very odd design decisions in there that weren't very thought out IMO.

Nobody is claiming this... Lack of output from -w doesn't mean "good code", but output from -w is evidence of "bad code" (for varying levels of "bad").

@zenspider
Copy link
Contributor Author

Where does this stand?

@zenspider
Copy link
Contributor Author

2 more weeks after months... Do what you want with this PR. I'm divesting myself of it.

@jneen
Copy link
Member

jneen commented Jul 25, 2023

Okay!

@tancnle
Copy link
Collaborator

tancnle commented Sep 6, 2023

Thank you for your hard work on this @zenspider 🙇🏼 ❤️

Sorry I am just catching up to this. I think the require_relative replacement makes sense to me. From the discussion threads above, I don't see there is any strong argument against it 🤔 (please correct me if I have misread). In that case, I think we should roll in the require_relative changes. The remaining can be broken down to smaller MRs so we can branch out further discussions. I can take point on splitting this MR. What do you think @zenspider @jneen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Double loads, regex warnings, and cleanup, oh my!
3 participants