Zenspider/warnings double loads and fixes #1962

zenspider · 2023-05-11T18:48:28Z

Fixes #1961

This PR fixes >300 warnings that are emitted while loading the code and/or running specs with warnings on.

Please consider reviewing this by commit.

Also consider turning off whitespace changes (for one commit on a rake task file). Tack on ?ws=1.

This is 12 commits, some should be squashed but I separated them because I didn't trust all my changes and this lets it be reviewable by the language owner.

Note: This completely removes use of (but not the definition of) Rouge.load_file and friends. They were not properly preventing double loads and eventually I decided it would be cleaner and easier if everything was just required via ruby to be more idiomatic and quell ~200 redefined method warnings. This does remove all of the "self-mutating" methods and eagerly loads the keyword files. It does not noticeably change load time at all. That might have been relevant when we all had slow spinning hard drives, but not anymore.

(no good way to set $VERBOSE from the outside)

Drops `rake check:specs VERBOSE=1` from 331 lines to 144.

load will load, always. Don't use it unless you really really mean it.

…t checked

Most of these should just be removed instead of underscored... but I want eyeballs on this first. Some aren't being used the way they think they're being used.

Turns out it was one local variable used in a lot of regexps. This updates the link to the reference and fixes it per reference.

git gem has warnings, this quells them

…quire Bundler shouldn't require everything. Slows down the world.

zenspider · 2023-05-11T18:49:25Z

One warning is left for a regexp in julia that I am digging into but might best me:

lib/rouge/lexers/julia.rb:255: warning: character class has duplicated range: /[\p{L}\p{Nl}\p{S}_][\p{Word}\p{S}\p{Po}!]*/

I've figured out that it is it is the last half, and related to the ! being in Po, but when I reduce the regexp the warning goes away but when I put the fix back in the original it is still there... I'm a bit stumped.

jneen · 2023-05-17T14:20:27Z

Ya know, I've been against it in the past, but I wouldn't mind transitioning to require_relative. The current system comes from a time when ruby's require was painfully slow (and in my defense, require_relative wasn't standard in commonly used ruby versions), but I imagine that's been more or less fixed in the last 10 years.

jneen · 2023-05-17T14:22:17Z

That being said, I would not trust ruby's -w as some kind of northern star for "good code", especially when it comes to regular expressions. There are a large number of very odd design decisions in there that weren't very thought out IMO.

lib/rouge/lexers/xojo.rb

jneen · 2023-05-17T14:27:36Z

lib/rouge/lexers/viml.rb

-        Kernel::load File.join(Lexers::BASE_DIR, 'viml/keywords.rb')
-        self.keywords
-      end
+      require_relative "viml/keywords"


Hm. The reason we did it this way was that these giant keyword files have a massive memory footprint, and we don't want Rouge to load them unless the lexer is actually in use. I think it'll still work if we use require_relative dynamically maybe?

running the above benchmark code with /usr/bin/time -lph:

master: 61194240 maximum resident set size 3888 page reclaims 4 page faults 17 involuntary context switches 6141564240 instructions retired 1349472140 cycles elapsed 55493952 peak memory footprint mine: 59604992 maximum resident set size 3791 page reclaims 4 page faults 10 involuntary context switches 6451778027 instructions retired 1398372284 cycles elapsed 51660096 peak memory footprint

(removed lines reporting 0)

I reran each 3 times and the numbers are all consistent...

cc @ashmaroli

lib/rouge/lexers/syzlang.rb

jneen · 2023-05-17T14:30:15Z

lib/rouge/lexers/sql.rb

@@ -115,7 +115,7 @@ def self.keywords_type
        rule %r/"/, Name::Variable, :double_string
        rule %r/`/, Name::Variable, :backtick

-        rule %r/\w[\w\d]*/ do |m|
+        rule %r/\w+/ do |m|


This is probably a mistake and they meant either \p{L}\w+ or [a-zA-Z]\w+, would have to check the language spec for that.

lib/rouge/lexers/scala.rb

jneen · 2023-05-17T14:35:58Z

lib/rouge/lexers/make.rb

@@ -73,7 +73,7 @@ def initialize(opts={})
      end

      state :export do
-        rule %r/[\w[\$]{1,2}{}()-]/, Name::Variable
+        rule %r/[\w\${}()-]/, Name::Variable


Not really this PR's fault, but we usually ask for + at the end of single-char-class regexes like this for perf reasons. This one's old enough it might have been my fault 😅

I'm fine adding the + ... but:

do you see how the {1,2} is also inside of a class, so it isn't a length specifier at all?... you might want to change the regexp to match more correctly. I haven't studied the make grammar in a really long time, but I'm pretty sure the above (either version) is incorrect.

jneen · 2023-05-17T14:37:17Z

lib/rouge/lexers/llvm.rb

-        Kernel::load File.join(Lexers::BASE_DIR, "llvm/keywords.rb")
-        types
-      end
+      require_relative "llvm/keywords"


Similar to before, we'd like to avoid loading these large keywords files for users who aren't highlighting LLVM. Though now I look at this it appears to have been a copy paste job and it will absolutely load that file multiple times 🤦

The old code is wrong. It shouldn't use load. If that alone gets fixed, great... but seriously, The strategies you used 10 years ago to speed things up aren't the strategies you need today. Case in point:

$-w = nil $: << "lib" require "rouge" keywords = Dir["lib/rouge/lexers/*/keywords.rb"] iters = 30 t0 = Time.now iters.times do keywords.each { |f| Kernel.load f } end t1 = Time.now puts "Average time to load all keywords.rb: %8.5fs" % [(t1 - t0) / iters] # Average time to load all keywords.rb: 0.00812s

It costs nearly nothing to load those keyword files... granted, every machine is different, but still. I suggest you measure and reevaluate whether self modifying code is worth it. I had thousands of lines of warnings when I was first evaluating whether to switch to rouge or not. Literally thousands. Those warnings have a cost too.

If you really want to make rouge faster, don't load everything. Load (roughly) nothing. Push it towards a lazy loading system. But if you're going to load everything, let ruby do it and let it do it properly (and only once)

zenspider · 2023-05-29T09:19:11Z

That being said, I would not trust ruby's -w as some kind of northern star for "good code", especially when it comes to regular expressions. There are a large number of very odd design decisions in there that weren't very thought out IMO.

Nobody is claiming this... Lack of output from -w doesn't mean "good code", but output from -w is evidence of "bad code" (for varying levels of "bad").

zenspider · 2023-07-08T23:10:24Z

Where does this stand?

zenspider · 2023-07-25T05:48:47Z

2 more weeks after months... Do what you want with this PR. I'm divesting myself of it.

jneen · 2023-07-25T06:08:49Z

Okay!

tancnle · 2023-09-06T23:55:33Z

Thank you for your hard work on this @zenspider 🙇🏼 ❤️

Sorry I am just catching up to this. I think the require_relative replacement makes sense to me. From the discussion threads above, I don't see there is any strong argument against it 🤔 (please correct me if I have misread). In that case, I think we should roll in the require_relative changes. The remaining can be broken down to smaller MRs so we can branch out further discussions. I can take point on splitting this MR. What do you think @zenspider @jneen?

zenspider added 12 commits May 7, 2023 15:46

Allow warnings in test if ENV["VERBOSE"]

eec7600

(no good way to set $VERBOSE from the outside)

Use #to_s to check for loaded lexers to avoid double loads.

35f8a97

Drops `rake check:specs VERBOSE=1` from 331 lines to 144.

Fix double load warnings by using require_relative

edf5574

load will load, always. Don't use it unless you really really mean it.

Remove duplicate range regexp warnings

ea56f89

Fix regexp in make... I'm not comfortable with this change and want i…

1455291

…t checked

Fix warnings about private_class_method usage

df4fb6c

Unused variable warnings

a186bc6

Most of these should just be removed instead of underscored... but I want eyeballs on this first. Some aren't being used the way they think they're being used.

Fixed a LOT of warnings from scala.

8d8b623

Turns out it was one local variable used in a lot of regexps. This updates the link to the reference and fixes it per reference.

Strip trailing whitespace from changelog.rake

3648a1d

Move require git inside of Rogue::Tasks::Git#initialize to make it lazy

d8146a6

git gem has warnings, this quells them

Move several gems in Gemfile into development group to avoid eager re…

8e6150a

…quire Bundler shouldn't require everything. Slows down the world.

Removed unnecessary require of minitest/spec

8a67ee1

jneen requested changes May 17, 2023

View reviewed changes

This was referenced Jan 7, 2024

Development related cleanup #2018

Merged

Turn on warnings via VERBOSE env #2019

Merged

This was referenced Mar 17, 2024

Fix duplicate range regexp warnings #2030

Merged

Fix private class unused variable warnings #2031

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zenspider/warnings double loads and fixes #1962

Zenspider/warnings double loads and fixes #1962

zenspider commented May 11, 2023

zenspider commented May 11, 2023

jneen commented May 17, 2023 •

edited

jneen commented May 17, 2023

jneen May 17, 2023

zenspider May 29, 2023

jneen May 29, 2023

jneen May 17, 2023

jneen May 17, 2023

zenspider May 29, 2023 •

edited

jneen May 17, 2023

zenspider May 29, 2023

zenspider May 29, 2023

zenspider commented May 29, 2023

zenspider commented Jul 8, 2023

zenspider commented Jul 25, 2023

jneen commented Jul 25, 2023

tancnle commented Sep 6, 2023 •

edited

Zenspider/warnings double loads and fixes #1962

Are you sure you want to change the base?

Zenspider/warnings double loads and fixes #1962

Conversation

zenspider commented May 11, 2023

zenspider commented May 11, 2023

jneen commented May 17, 2023 • edited

jneen commented May 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zenspider May 29, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zenspider commented May 29, 2023

zenspider commented Jul 8, 2023

zenspider commented Jul 25, 2023

jneen commented Jul 25, 2023

tancnle commented Sep 6, 2023 • edited

jneen commented May 17, 2023 •

edited

zenspider May 29, 2023 •

edited

tancnle commented Sep 6, 2023 •

edited