New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement $replace
modifier
#3897
base: master
Are you sure you want to change the base?
Conversation
$replace
modifier
Note: |
|
const [, rawRegexp, replacement, modifiers] = splitUnescaped(optionValue, '/'); | ||
const regexp = new RegExp(rawRegexp, modifiers); | ||
|
||
return [regexp, replacement]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the filter doesn't need to deal with multiple html filtering filter modifiers, we may build regex and replacement at the parse time instead of runtime. (Then throw an error if multiple html filtering modifiers were found)
@seia-soto Could you share more information on the
Thanks, I think it will help review the implementation. |
@@ -2444,4 +2444,12 @@ describe('scriptlets arguments parsing', () => { | |||
); | |||
} | |||
}); | |||
|
|||
it('parses replace modifier', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add more tests since anything related to HTML filtering is critical to get right. If possible, let's add one test case for each existing filter in the lists used so far (I assume there are not so many). Let's particularly try to find corner cases of the behavior of this new $replace
option.
We also need more end-to-end tests on the whole HTML filtering.
modifierOptionValue: | ||
(optionalParts & 32) === 32 | ||
? getBit(mask, NETWORK_FILTER_MASK.isCSP) | ||
? buffer.getNetworkCSP() | ||
: getBit(mask, NETWORK_FILTER_MASK.isRedirect) | ||
? buffer.getNetworkRedirect() | ||
: buffer.getUTF8() | ||
: undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might make sense to replace both getNetworkCSP
and getNetworkRedirect
with a unified getModifierOptionValue
(including computing the codebooks of all these values instead of having separate one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of that, but in a view of backward compatibility, I think we can leave the method as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I think integrating the codebook won't be a problem here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seia-soto if that helps, we could split this PR into two parts:
- introduce
modifierOptionValue
- implement
$replace
on top of it
return nextIndex; | ||
} | ||
|
||
function splitUnescaped(text: string, character: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we add enough test cases to cover this (and above) functions.
return null; | ||
} | ||
|
||
public isHtmlFilteringRule(): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: I wonder if it could make sense to also use the NetworkFilter abstraction to represent the ^script-text html filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking a similar system to that. However, I don't know if it's possible to make a proper regexp filtering system with streaming data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation already supports regexps from what I can tell.
For example this filter (already supported):
##^script:has-text(/innerHTML.*appendChild/)
Would be equivalent to (if my understanding is correct):
$replace=/innerHTML.*appendChild//
So the current mechanism needs to be extended but it seems doable without changing to a different framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation does buffer the all of the script tag until they're finished as I know. This means we know the start and end of the part. However, $replace
is performed in full content which means we might need to know full text.
@remusao In aspect of adblocker library, the importance of |
All uBO filters
|
Just a note: better filter selection should be done from |
Current matching logic in Ghostery 10 for Firefox: will have to changed from:
to:
|
2a03145
to
da5f5b9
Compare
fixes #3886 built top on #3887
https://adguard.com/kb/general/ad-filtering/create-own-filters/#replace-modifier
Why
rawOptions
?$replace
is not the only option we should deal with network filters' html filtering capability. Most of network filtering options involve pattern definition in its option. This prevents writing additional fields in future.