Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement $replace modifier #3897

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

seia-soto
Copy link
Member

fixes #3886 built top on #3887

https://adguard.com/kb/general/ad-filtering/create-own-filters/#replace-modifier


Why rawOptions?

$replace is not the only option we should deal with network filters' html filtering capability. Most of network filtering options involve pattern definition in its option. This prevents writing additional fields in future.

@seia-soto seia-soto changed the title fix: properly find the filter options index feat: implement $replace modifier Apr 9, 2024
@seia-soto
Copy link
Member Author

Note: only option was set in the test.

@seia-soto
Copy link
Member Author

seia-soto commented Apr 9, 2024

  • Parsing $replace modifier
  • Validating components of the $replace modifier
  • Integrate with html filtering api (StreamingHtmlFilter)

const [, rawRegexp, replacement, modifiers] = splitUnescaped(optionValue, '/');
const regexp = new RegExp(rawRegexp, modifiers);

return [regexp, replacement];
Copy link
Member Author

@seia-soto seia-soto Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the filter doesn't need to deal with multiple html filtering filter modifiers, we may build regex and replacement at the parse time instead of runtime. (Then throw an error if multiple html filtering modifiers were found)

@chrmod chrmod added the PR: New Feature 🚀 Increment minor version when merged label Apr 15, 2024
@remusao
Copy link
Collaborator

remusao commented Apr 22, 2024

@seia-soto Could you share more information on the $replace option? In particular:

  1. How many filters rely on this option currently?
  2. Can you list some (or all) of them as examples explaining the behavior?

Thanks, I think it will help review the implementation.

@@ -2444,4 +2444,12 @@ describe('scriptlets arguments parsing', () => {
);
}
});

it('parses replace modifier', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add more tests since anything related to HTML filtering is critical to get right. If possible, let's add one test case for each existing filter in the lists used so far (I assume there are not so many). Let's particularly try to find corner cases of the behavior of this new $replace option.

We also need more end-to-end tests on the whole HTML filtering.

Comment on lines 943 to 950
modifierOptionValue:
(optionalParts & 32) === 32
? getBit(mask, NETWORK_FILTER_MASK.isCSP)
? buffer.getNetworkCSP()
: getBit(mask, NETWORK_FILTER_MASK.isRedirect)
? buffer.getNetworkRedirect()
: buffer.getUTF8()
: undefined,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to replace both getNetworkCSP and getNetworkRedirect with a unified getModifierOptionValue (including computing the codebooks of all these values instead of having separate one).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of that, but in a view of backward compatibility, I think we can leave the method as-is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think integrating the codebook won't be a problem here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seia-soto if that helps, we could split this PR into two parts:

  1. introduce modifierOptionValue
  2. implement $replace on top of it

return nextIndex;
}

function splitUnescaped(text: string, character: string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we add enough test cases to cover this (and above) functions.

return null;
}

public isHtmlFilteringRule(): boolean {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: I wonder if it could make sense to also use the NetworkFilter abstraction to represent the ^script-text html filters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking a similar system to that. However, I don't know if it's possible to make a proper regexp filtering system with streaming data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation already supports regexps from what I can tell.

For example this filter (already supported):

##^script:has-text(/innerHTML.*appendChild/)

Would be equivalent to (if my understanding is correct):

$replace=/innerHTML.*appendChild//

So the current mechanism needs to be extended but it seems doable without changing to a different framework.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation does buffer the all of the script tag until they're finished as I know. This means we know the start and end of the part. However, $replace is performed in full content which means we might need to know full text.

@seia-soto
Copy link
Member Author

@seia-soto Could you share more information on the $replace option? In particular:

  1. How many filters rely on this option currently?
  2. Can you list some (or all) of them as examples explaining the behavior?

Thanks, I think it will help review the implementation.

@remusao In aspect of adblocker library, the importance of $replace is pretty much lower than implementing the full html capability. I'd like to build an integrated system for html filtering because most of them are handled with regexp. However, in aspect of Ghostery product, this is in high priority since these filters do important role in YouTube blocking.

@chrmod
Copy link
Member

chrmod commented Apr 29, 2024

All uBO filters replace that are currently in use:

filters/filters-2024.txt
30:||alliptvlinks.com/tktk-content/plugins/$script,1p,replace=/\bconst now.+?, 100/clearInterval(timer);resolve();}, 100/gms
579:/theme/002/js/application.js?2.0|$script,1p,replace=/video\.maxPop/0/

filters/unbreak.txt
4802:||s3media.247sports.com/Scripts/Bundle/*/videoPlayer.js^$script,1p,replace=/;if\(!\([a-z]+\|\|\(null===[^{]+/;if(false)/

filters/filters-2023.txt
2931:||dehlinks.ir/link_download.php?Mozojadid_Id=$doc,replace=/content="15;/content="0;/
3017:||rekidai-info.github.io/_app/immutable/components/pages/index/_page.svelte-$script,replace=/try\{.*?catch.*?push\(\)\}catch\{//
3018:||rekidai-info.github.io/_app/immutable/components/pages/index/_page.svelte-$script,replace=/throw new Error\("Error Loading Rekidai Data."\)\}throw new Error\("Ad block detected."\)//
5289:||veev.to/assets/videoplayer/*.js$script,replace=/\bhttps:\/\/pagead2\.googlesyndication\.com\/pagead\/js\/adsbygoogle\.js/https:\/\/veev.to\/assets\/videoplayer\/17c088d.js/

filters/filters-2022.txt
3502:||theappstore.org/script.js?v=$script,1p,replace=/result\.length \> 10000/result.length < 10000/g
3606:/loader.min.js$xhr,script,domain=loawa.com|ygosu.com|sportalkorea.com|enetnews.co.kr|edaily.co.kr|economist.co.kr|etoday.co.kr|hankyung.com|isplus.com|hometownstation.com|inven.co.kr|honkailab.com|warcraftrumbledeck.com|genshinlab.com|thestockmarketwatch.com|thephoblographer.com|issuya.com|dogdrip.net|worldhistory.org|bamgosu.site,replace=/\)\{var [a-z]{1,2},[a-z]{1,2},[a-z]{1,2},[a-z]{1,2}\=[a-z]{2};return [a-z]\(\)/){return;/g
3607:/loader.min.js$xhr,script,domain=loawa.com|ygosu.com|sportalkorea.com|enetnews.co.kr|edaily.co.kr|economist.co.kr|etoday.co.kr|hankyung.com|isplus.com|hometownstation.com|inven.co.kr|honkailab.com|warcraftrumbledeck.com|genshinlab.com|thestockmarketwatch.com|thephoblographer.com|issuya.com|dogdrip.net|worldhistory.org|bamgosu.site,replace=/\)\{var [a-z]{1,2},[a-z]{1,2},[a-z]{1,2};.*?return [a-z]\(\)/){return; return c()/g
3608:/loader.min.js$xhr,script,domain=loawa.com|ygosu.com|sportalkorea.com|enetnews.co.kr|edaily.co.kr|economist.co.kr|etoday.co.kr|hankyung.com|isplus.com|hometownstation.com|inven.co.kr|honkailab.com|warcraftrumbledeck.com|genshinlab.com|thestockmarketwatch.com|thephoblographer.com|issuya.com|dogdrip.net|worldhistory.org,replace=/\.mark\(\(function [a-z0-9]{1,2}\([a-z0-9]{1,2},[a-z0-9]{1,2}\){var.*\]\]\)\}\)\)\),/.mark((function neutralized(a,b){var none = false;}))),/g
4298:||bitcotasks.com/assets/js/mainjs.php$script,1p,replace=/entry.duration > 0/entry.duration < 10/

filters/quick-fixes.txt
129:||d3lj2s469wtjp0.cloudfront.net/build/js/public/$script,3p,replace=/\{try\{.*?clip-path.*?catch\(/{try{}catch(/,domain=puzzle-loop.com|puzzle-words.com|puzzle-chess.com|puzzle-thermometers.com|puzzle-norinori.com|puzzle-minesweeper.com|puzzle-slant.com|puzzle-lits.com|puzzle-galaxies.com|puzzle-tents.com|puzzle-battleships.com|puzzle-pipes.com|puzzle-hitori.com|puzzle-heyawake.com|puzzle-shingoki.com|puzzle-masyu.com|puzzle-stitches.com|puzzle-aquarium.com|puzzle-tapa.com|puzzle-star-battle.com|puzzle-kakurasu.com|puzzle-skyscrapers.com|puzzle-futoshiki.com|puzzle-shakashaka.com|puzzle-kakuro.com|puzzle-jigsaw-sudoku.com|puzzle-killer-sudoku.com|puzzle-binairo.com|puzzle-nonograms.com|puzzle-sudoku.com|puzzle-light-up.com|puzzle-bridges.com|puzzle-shikaku.com|puzzle-nurikabe.com|puzzle-dominosa.com
139:||statics.1mv.xyz/statics/*.js|$script,3p,replace=/;return _0x[a-z0-9]+\['[_a-z]+'\]\['s'\]/;return false/
140:||statics.1mv.xyz/statics/*.js|$script,3p,replace=/;if\(null!==\(_0x[a-z0-9]+=this\['[_a-z]+'\]\)[^)]+\)return;/;if(true)return;/
153:||in-jpn.com^$script,replace=/var w_status[\s\S\n]+?doSakigake\(\);[\s\S\n]+?\}//,badfilter
154:||in-jpn.com^$script,replace=/var w_\w+[\s\S\n]+?doSakigake\(\);[\s\S\n]+?\}//

filters/annoyances-others.txt
396:||www.facebook.com/api/graphql/$xhr,replace=/\{"brs_content_label":[^,]+,"category":"ENGAGEMENT[^\n]+"cursor":"[^"]+"\}/{}/g
7177:||solarmovie.vip/js/$script,1p,replace=/\(\{checkers\:.*?\]\}\)/({checkers:[]})/g
7484:||tver.jp/_next/static/chunks/$replace=/e\?(e\(\):\(n\.play\(\))/!1?\$1/,script

filters/filters.txt
25:||www.youtube.com/playlist?list=$xhr,1p,replace=/"adPlacements.*?([A-Z]"\}|"\}{2\,4})\}\]\,//
26:||www.youtube.com/playlist?list=$xhr,1p,replace=/"adSlots.*?\}\]\}\}\]\,//
27:||www.youtube.com/watch?v=$xhr,1p,replace=/"adPlacements.*?([A-Z]"\}|"\}{2\,4})\}\]\,//
28:||www.youtube.com/watch?v=$xhr,1p,replace=/"adSlots.*?\}\]\}\}\]\,//
29:||www.youtube.com/youtubei/v1/player?$xhr,1p,replace=/"adPlacements.*?([A-Z]"\}|"\}{2\,4})\}\]\,//
30:||www.youtube.com/youtubei/v1/player?$xhr,1p,replace=/"adSlots.*?\}\]\}\}\]\,//
489:||www.facebook.com/api/graphql/$xhr,replace=/\{"brs_content_label":[^,]+,"category":"SPONSORED"[^\n]+"cursor":"[^"]+"\}/{}/
490:||www.facebook.com/api/graphql/$xhr,replace=/\{"node":\{"role":"SEARCH_ADS"[^\n]+?cursor":[^}]+\}/{}/g
491:||www.facebook.com/api/graphql/$xhr,replace=/\{"node":\{"__typename":"MarketplaceFeedAdStory"[^\n]+?"cursor":(?:null|"\{[^\n]+?\}"|[^\n]+?MarketplaceSearchFeedStoriesEdge")\}/{}/g

@seia-soto
Copy link
Member Author

Just a note: better filter selection should be done from performHTMLFiltering

@chrmod
Copy link
Member

chrmod commented Apr 29, 2024

Current matching logic in Ghostery 10 for Firefox:
https://github.com/ghostery/ghostery-extension/blob/d2542406174fb59ff939095b6d6d925bea79a3b9/extension-manifest-v3/src/background/adblocker.js#L356

will have to changed from:

  1. for main frames - apply html cosmetic filters
  2. for all other - match network filter

to:

  1. match all network filters and html cosmetic filters
  2. for main frames - apply html cosmetic filters and html network filters
  3. for all other - block if block network filter matched, filter html if any html filter matched

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: New Feature 🚀 Increment minor version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support $replace
3 participants