-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BREAKING CHANGE] hostname replace plugin rewrite #3463
base: master
Are you sure you want to change the base?
Conversation
2174601
to
3d0b224
Compare
Some thoughts of mine:
|
Yes, we should probably continue supporting the old
If you know how, that can surely be part of this PR, but I think that could also be made later in an other PR. |
Are you interested in adding this to your PR?: #3051 |
Can you implement backward compatibility in this PR?
After taking a closer look at the problem with the info box ("With the current architecture, the plugin must parse the infobox results."), I also think that this is not part of this PR. |
Yes, done now and tested 👍 |
Diff to test the fallback to the old hostname_replace plugin: diff --git a/searx/settings.yml b/searx/settings.yml
index 57b3dba86..12654cd18 100644
--- a/searx/settings.yml
+++ b/searx/settings.yml
@@ -212,7 +212,8 @@ outgoing:
# Comment or un-comment plugin to activate / deactivate by default.
#
-# enabled_plugins:
+enabled_plugins:
+ - 'Hostnames plugin'
# # these plugins are enabled if nothing is configured ..
# - 'Hash plugin'
# - 'Self Information'
@@ -252,6 +253,15 @@ outgoing:
# '(.*\.)?youtube\.com$': 'invidious.example.com'
# '(.*\.)?youtu\.be$': 'invidious.example.com'
#
+hostname_replace:
+ '(.*\.)?youtube\.com$': 'invidious.example.com'
+ '(.*\.)?youtu\.be$': 'invidious.example.com'
+ '(.*\.)?youtube-noocookie\.com$': 'yotter.example.com'
+ '(.*\.)?reddit\.com$': 'teddit.example.com'
+ '(.*\.)?redd\.it$': 'teddit.example.com'
+ '(www\.)?twitter\.com$': 'nitter.example.com'
+ # to remove matching host names from result list, set value to false
+ '(.*\.)?google\.com$': false
checker:
# disable checker when in debug mode |
I think that this looks like a nice change as well, but I think that we should focus on the basic changes this PR tries to implement for now and look at such other improvements later. |
searx/results.py
Outdated
@@ -355,7 +355,7 @@ def close(self): | |||
|
|||
for result in self._merged_results: | |||
score = result_score(result) | |||
result['score'] = score | |||
result['score'] = result.get('score', 0) + score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The score is calculated using the weight
factor of the engine, I'm not sure if an constant value subtracted (or added) from (to) the score is the best choice.
The factor weight
has an exponential effect, e.g. if several engines have this result item ...
Lines 133 to 142 in 75e4b65
def result_score(result): | |
weight = 1.0 | |
for result_engine in result['engines']: | |
if hasattr(engines[result_engine], 'weight'): | |
weight *= float(engines[result_engine].weight) | |
occurrences = len(result['positions']) | |
return sum((occurrences * weight) / position for position in result['positions']) |
The deduction of a constant from the score only has a linear effect ... and with increasing weight
the effect of this constant on the result becomes less and less.
But since the calculation is encapsulated behind low_priority
and high_priority
we can also think about the calculation and weighting in another PR / for me this solution would be acceptable for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I rebased an placed a WIP commit to improve the scoring (as I mentioned in #3463 (review)) .. can you please have a look at the WIP commit .. if you think its OK please squash accepted modifications into your commits.
Diff to test the fallback to the old hostname_replace plugin:
Thanks for the patch but I think its not really downward compatible since the old name in the existing setups is 'Hostname replace'
.. I don't have a clue right now how we can solve this issue .. ideas?
# -# enabled_plugins: +enabled_plugins: + - 'Hostnames plugin'
I don't know why this wasn't noticed before, but the pylint has found errors in one of your commits ... maybe it's because the rebase to the master now has a new pylint version in it. |
1297d59
to
8449a4a
Compare
Done, thanks 👍
Nope, not really. The easiest solution would probably be to keep the old name for the plugin - but it's doing more than just basic hostname replaces now, so I don't think it's a good idea. I'll think about if there's another way.
Done 👍 (all checks should pass now hopefully) |
I pushed a WIP commit .. not very sexy but I think it is sufficient for a transitional period. Can you please check my WIP / if OK squash it into your commits .. additional to this PR we need an announcement .. diff --git a/searx/settings.yml b/searx/settings.yml
index d5d311956..80b5aba22 100644
--- a/searx/settings.yml
+++ b/searx/settings.yml
@@ -212,14 +212,14 @@ outgoing:
# Comment or un-comment plugin to activate / deactivate by default.
#
-# enabled_plugins:
+enabled_plugins:
# # these plugins are enabled if nothing is configured ..
# - 'Hash plugin'
# - 'Self Information'
# - 'Tracker URL remover'
# - 'Ahmia blacklist' # activation depends on outgoing.using_tor_proxy
# # these plugins are disabled if nothing is configured ..
-# - 'Hostnames plugin' # see 'hostnames' configuration below
+ - 'Hostname replace' # see 'hostnames' configuration below
# - 'Calculator plugin'
# - 'Open Access DOI rewrite'
# - 'Tor check plugin'
@@ -228,6 +228,17 @@ outgoing:
# # preferences if they want.
# - 'Autodetect search language'
+hostname_replace:
+ '(.*\.)?youtube\.com$': 'invidious.example.com'
+ '(.*\.)?youtu\.be$': 'invidious.example.com'
+ '(.*\.)?youtube-noocookie\.com$': 'yotter.example.com'
+ '(.*\.)?reddit\.com$': 'teddit.example.com'
+ '(.*\.)?redd\.it$': 'teddit.example.com'
+ '(www\.)?twitter\.com$': 'nitter.example.com'
+ # to remove matching host names from result list, set value to false
+ 'spam\.example\.com': false
+
+
# Configuration of the "Hostnames plugin":
#
# hostnames: |
FYI: I added 4489388 to your commit series .. you can build the documentation in this commit by |
Great, thank you! The documentation on this change looks very clear and simple to understand to me, so it should be easy for instance maintainers to follow it 👍 Though I wondered if we should maybe also add that link |
…sites Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
…ugin Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
What does this PR do?
hostnames
plugin with additional features:high_priority
hostnames: these websites will get a higher score/rankinglow_priority
hostnames: these websites will get a lower score/rankingHow to test this PR locally?
hostnames
configuration and enable the hostnames plugingoogle
,youtube
,forgejo
(to see codeberg up in the results), ...Related issues