Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING CHANGE] hostname replace plugin rewrite #3463

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Bnyro
Copy link
Member

@Bnyro Bnyro commented May 5, 2024

What does this PR do?

How to test this PR locally?

  • uncomment the hostnames configuration and enable the hostnames plugin
  • search for google, youtube, forgejo (to see codeberg up in the results), ...

Related issues

@return42
Copy link
Member

Some thoughts of mine:

  • Do we need backward compatibility for a while? .. reading hostname_replace from settings.yml and prompt a deprecation message?

  • When we are implementing a replacement for the old hostname_replace plugin, we should also implement a solution where the old fails:

@Bnyro
Copy link
Member Author

Bnyro commented May 11, 2024

* Do we need backward compatibility for a while? .. reading `hostname_replace` from settings.yml and prompt a deprecation message?

Yes, we should probably continue supporting the old hostname_replace format for now, I've actually had that in my initial draft of this but then removed it because I refactored some code.

* When we are implementing a replacement for the old `hostname_replace` plugin, we should also implement a solution where the old fails:
  
  * [Links in the infobox are not modified by hostname_replace #1348](https://github.com/searxng/searxng/issues/1348)

If you know how, that can surely be part of this PR, but I think that could also be made later in an other PR.

@Austin-Olacsi
Copy link
Contributor

Austin-Olacsi commented May 12, 2024

Are you interested in adding this to your PR?: #3051

@return42
Copy link
Member

Yes, we should probably continue supporting the old hostname_replace format for now, I've actually had that in my initial draft of this but then removed it because I refactored some code.

Can you implement backward compatibility in this PR?

If you know how, that can surely be part of this PR, but I think that could also be made later in an other PR.

After taking a closer look at the problem with the info box ("With the current architecture, the plugin must parse the infobox results."), I also think that this is not part of this PR.

@Bnyro
Copy link
Member Author

Bnyro commented May 15, 2024

Can you implement backward compatibility in this PR?

Yes, done now and tested 👍

@Bnyro
Copy link
Member Author

Bnyro commented May 15, 2024

Diff to test the fallback to the old hostname_replace plugin:

diff --git a/searx/settings.yml b/searx/settings.yml
index 57b3dba86..12654cd18 100644
--- a/searx/settings.yml
+++ b/searx/settings.yml
@@ -212,7 +212,8 @@ outgoing:
 
 # Comment or un-comment plugin to activate / deactivate by default.
 #
-# enabled_plugins:
+enabled_plugins:
+  - 'Hostnames plugin'
 #   # these plugins are enabled if nothing is configured ..
 #   - 'Hash plugin'
 #   - 'Self Information'
@@ -252,6 +253,15 @@ outgoing:
 # '(.*\.)?youtube\.com$': 'invidious.example.com'
 # '(.*\.)?youtu\.be$': 'invidious.example.com'
 #
+hostname_replace:
+  '(.*\.)?youtube\.com$': 'invidious.example.com'
+  '(.*\.)?youtu\.be$': 'invidious.example.com'
+  '(.*\.)?youtube-noocookie\.com$': 'yotter.example.com'
+  '(.*\.)?reddit\.com$': 'teddit.example.com'
+  '(.*\.)?redd\.it$': 'teddit.example.com'
+  '(www\.)?twitter\.com$': 'nitter.example.com'
+  # to remove matching host names from result list, set value to false
+  '(.*\.)?google\.com$': false
 
 checker:
   # disable checker when in debug mode

@Bnyro
Copy link
Member Author

Bnyro commented May 15, 2024

Are you interested in adding this to your PR?: #3051

I think that this looks like a nice change as well, but I think that we should focus on the basic changes this PR tries to implement for now and look at such other improvements later.

@Bnyro Bnyro changed the title [feat] hostname replace plugin: possibility to prioritize certain websites [BREAKING CHANGE] hostname replace plugin: possibility to prioritize certain websites May 15, 2024
@Bnyro Bnyro changed the title [BREAKING CHANGE] hostname replace plugin: possibility to prioritize certain websites [BREAKING CHANGE] hostname replace plugin rewrite May 15, 2024
searx/results.py Outdated
@@ -355,7 +355,7 @@ def close(self):

for result in self._merged_results:
score = result_score(result)
result['score'] = score
result['score'] = result.get('score', 0) + score
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The score is calculated using the weight factor of the engine, I'm not sure if an constant value subtracted (or added) from (to) the score is the best choice.

The factor weight has an exponential effect, e.g. if several engines have this result item ...

searxng/searx/results.py

Lines 133 to 142 in 75e4b65

def result_score(result):
weight = 1.0
for result_engine in result['engines']:
if hasattr(engines[result_engine], 'weight'):
weight *= float(engines[result_engine].weight)
occurrences = len(result['positions'])
return sum((occurrences * weight) / position for position in result['positions'])

The deduction of a constant from the score only has a linear effect ... and with increasing weight the effect of this constant on the result becomes less and less.

But since the calculation is encapsulated behind low_priority and high_priority we can also think about the calculation and weighting in another PR / for me this solution would be acceptable for now.

Copy link
Member

@return42 return42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I rebased an placed a WIP commit to improve the scoring (as I mentioned in #3463 (review)) .. can you please have a look at the WIP commit .. if you think its OK please squash accepted modifications into your commits.


Diff to test the fallback to the old hostname_replace plugin:

Thanks for the patch but I think its not really downward compatible since the old name in the existing setups is 'Hostname replace' .. I don't have a clue right now how we can solve this issue .. ideas?

 #
-# enabled_plugins:
+enabled_plugins:
+  - 'Hostnames plugin'

@return42
Copy link
Member

I don't know why this wasn't noticed before, but the pylint has found errors in one of your commits ... maybe it's because the rebase to the master now has a new pylint version in it.

searx/plugins/hostnames.py Outdated Show resolved Hide resolved
@Bnyro Bnyro force-pushed the hostnames branch 3 times, most recently from 1297d59 to 8449a4a Compare May 27, 2024 17:54
@Bnyro
Copy link
Member Author

Bnyro commented May 27, 2024

FYI I rebased an placed a WIP commit to improve the scoring (as I mentioned in #3463 (review)) .. can you please have a look at the WIP commit .. if you think its OK please squash accepted modifications into your commits.

Done, thanks 👍

Thanks for the patch but I think its not really downward compatible since the old name in the existing setups is 'Hostname replace' .. I don't have a clue right now how we can solve this issue .. ideas?

Nope, not really. The easiest solution would probably be to keep the old name for the plugin - but it's doing more than just basic hostname replaces now, so I don't think it's a good idea. I'll think about if there's another way.

I don't know why this wasn't noticed before, but the pylint has found errors in one of your commits ... maybe it's because the rebase to the master now has a new pylint version in it.

Done 👍 (all checks should pass now hopefully)

@return42
Copy link
Member

return42 commented May 28, 2024

I'll think about if there's another way.

I pushed a WIP commit .. not very sexy but I think it is sufficient for a transitional period.

Can you please check my WIP / if OK squash it into your commits .. additional to this PR we need an announcement ..


diff --git a/searx/settings.yml b/searx/settings.yml
index d5d311956..80b5aba22 100644
--- a/searx/settings.yml
+++ b/searx/settings.yml
@@ -212,14 +212,14 @@ outgoing:
 
 # Comment or un-comment plugin to activate / deactivate by default.
 #
-# enabled_plugins:
+enabled_plugins:
 #   # these plugins are enabled if nothing is configured ..
 #   - 'Hash plugin'
 #   - 'Self Information'
 #   - 'Tracker URL remover'
 #   - 'Ahmia blacklist'  # activation depends on outgoing.using_tor_proxy
 #   # these plugins are disabled if nothing is configured ..
-#   - 'Hostnames plugin'  # see 'hostnames' configuration below
+  - 'Hostname replace'  # see 'hostnames' configuration below
 #   - 'Calculator plugin'
 #   - 'Open Access DOI rewrite'
 #   - 'Tor check plugin'
@@ -228,6 +228,17 @@ outgoing:
 #   # preferences if they want.
 #   - 'Autodetect search language'
 
+hostname_replace:
+  '(.*\.)?youtube\.com$': 'invidious.example.com'
+  '(.*\.)?youtu\.be$': 'invidious.example.com'
+  '(.*\.)?youtube-noocookie\.com$': 'yotter.example.com'
+  '(.*\.)?reddit\.com$': 'teddit.example.com'
+  '(.*\.)?redd\.it$': 'teddit.example.com'
+  '(www\.)?twitter\.com$': 'nitter.example.com'
+  # to remove matching host names from result list, set value to false
+  'spam\.example\.com': false
+
+
 # Configuration of the "Hostnames plugin":
 #
 # hostnames:

@return42
Copy link
Member

return42 commented May 28, 2024

FYI: I added 4489388 to your commit series .. you can build the documentation in this commit by make docs.live and then visit http://0.0.0.0:8000/src/searx.plugins.hostnames.html

@Bnyro
Copy link
Member Author

Bnyro commented Jun 3, 2024

FYI: I added 4489388 to your commit series .. you can build the documentation in this commit by make docs.live and then visit http://0.0.0.0:8000/src/searx.plugins.hostnames.html

Great, thank you!

The documentation on this change looks very clear and simple to understand to me, so it should be easy for instance maintainers to follow it 👍

Though I wondered if we should maybe also add that link https://docs.searxng.org/src/searx.plugins.hostnames.html to the deprecated_msg to make it easier for instance maintainers to find out where to start?

Bnyro and others added 4 commits June 3, 2024 12:32
…sites

Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
…ugin

Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants