Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_enum(:scan, ...) does not set Regexp.last_match #1484

Closed
MaxLap opened this issue Nov 28, 2018 · 10 comments
Closed

to_enum(:scan, ...) does not set Regexp.last_match #1484

MaxLap opened this issue Nov 28, 2018 · 10 comments
Assignees
Labels
Milestone

Comments

@MaxLap
Copy link

MaxLap commented Nov 28, 2018

A weird effect of using to_enum with scan is that it will not set the Regexp.last_match. Using scan alone does it just fine.

In MRI and JRuby:

"wawa".to_enum(:scan, /./).map {|o| o }
 #=> ["w", "a", "w", "a"]
"wawa".to_enum(:scan, /./).map {|o| Regexp.last_match }
 #=> [#<MatchData "w">, #<MatchData "a">, #<MatchData "w">, #<MatchData "a">]

In truffleruby:

"wawa".to_enum(:scan, /./).map {|o| o }
 #=> ["w", "a", "w", "a"]
"wawa".to_enum(:scan, /./).map {|o| Regexp.last_match }
 #=> [nil, nil, nil, nil] 
Regexp.last_match
 #=> nil
# Showing that Regexp.last_match wasn't changed at all
"hello".scan('l') 
 #=> ["l", "l"]
"wawa".to_enum(:scan, /./).map {|o| o }
 #=> ["w", "a", "w", "a"]
"wawa".to_enum(:scan, /./).map {|o| Regexp.last_match }
 #=> [#<MatchData "l">, #<MatchData "l">, #<MatchData "l">, #<MatchData "l">] # We still have "l"
Regexp.last_match
 #=> #<MatchData "l"> # We still have "l"
@eregon
Copy link
Member

eregon commented Nov 28, 2018

Looks like an easy fix:

Truffle::RegexpOperations.set_last_match(match, Truffle.invoke_primitive(:caller_binding))
should set $~ in block.binding and not the caller binding.

MaxLap added a commit to deep-cover/deep-cover that referenced this issue Nov 30, 2018
MaxLap added a commit to deep-cover/deep-cover that referenced this issue Nov 30, 2018
MaxLap added a commit to deep-cover/deep-cover that referenced this issue Dec 2, 2018
MaxLap added a commit to deep-cover/deep-cover that referenced this issue Dec 2, 2018
MaxLap added a commit to deep-cover/deep-cover that referenced this issue Dec 6, 2018
MaxLap added a commit to deep-cover/deep-cover that referenced this issue Dec 6, 2018
MaxLap added a commit to deep-cover/deep-cover that referenced this issue Mar 20, 2019
@deepj
Copy link

deepj commented Apr 18, 2019

@eregon can you make a priority with this issue if the fix is easy, please?

@eregon eregon self-assigned this May 19, 2019
@eregon
Copy link
Member

eregon commented May 19, 2019

@deepj I'll take care of this, I forgot about it.
Please mention @chrisseaton for priority issues.

@deepj
Copy link

deepj commented May 20, 2019

@chrisseaton Please, is this possible to make a priority, please?

@chrisseaton
Copy link
Collaborator

Will do.

@eregon
Copy link
Member

eregon commented May 20, 2019

@MaxLap FWIW, a simpler and better workaround than deep-cover/deep-cover@f3da81b (e.g., that doesn't work if the matcher matches an empty String) is:

# Instead of:
source.to_enum(:scan, matcher).map { Regexp.last_match }
# This works around:
match_datas = []
source.scan(matcher) { match_datas << Regexp.last_match }
match_datas

Additionally that workaround is also faster on CRuby: rubocop/rubocop#8602

The bug here is due to to_enum having a few blocks of indirection, and so when we set $~ we end up setting it on an internal block of Enumerable#map and not the user-supplied block. I'm looking at how to fix this.

@eregon
Copy link
Member

eregon commented Jul 17, 2019

Removing this from priority because it seems really hard to fix unfortunately, and there is an easy workaround (see my message just above).

@eregon
Copy link
Member

eregon commented Aug 31, 2020

One potential idea to fix this is to mark some internal methods/blocks as ignored for $~, and keep searching further.
That could work for cases where $~ is set on the caller method.
Here it is set on a block though, so maybe we'd need to mark the intermediate block as ignored for $~ and remember the original block by storing it maybe as a field of that intermediate block.

@aardvark179
Copy link
Contributor

Now we've refactored how special variables are handled we can actually implement this pretty efficiently. Essentially we can do something like

def map(&block)
  sv = Primitive.proc_special_variables(block)
  each { |x|
    Primitive.regexp_last_match_set(sv, $~) if $~
    Primitive.io_last_line_set(sv, $_) if $_
    ret = yield(x)
    $~ = nil
    $_ = nil
    ret
  }
end

This isn't quite perfect, we should ideally have a marker value so we can correctly detect when $~ and $_ have been specifically changed to nil. There are a couple of places where we need to change this, but it's a fairly small change overall.

@eregon eregon added this to the 21.0.0 milestone Nov 3, 2020
@aardvark179
Copy link
Contributor

aardvark179 commented Nov 3, 2020

This was resolved in 94d8eb0 using a variation of the technique I suggested above, so I will close this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants