-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix bing returning same results, page numbering, minor refactor #3416
base: master
Are you sure you want to change the base?
fix bing returning same results, page numbering, minor refactor #3416
Conversation
Bing page numbering doesn't increase by 10 each time. The first page returns 10 results, and all pages thereafter return 14 results. This means we need to update the page numbering Next, the 'sc' parameter, whatever it means, needs to be present in order to not return the same results. Finally, the code to check the page had some duplicate checks, so I refactored the code in this section which is low-risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's Bing time again 😢
- The changes don't work for me unfortunately
- Page
start
s are1
,11
,21
,31
, ... for me, not as you changed it here - The
sc
parameter is11-3
for me - I tried changing it in SearXNG but pagination still doesn't work oddly
It pretty much seems like Bing behaves different a lot depending on your location (mine is Germany) or other circumstances - which makes it pretty hard to get things to work properly...
original posted in #3358 (comment)
Anyway, I'm very happy to see that the community is addressing the issue ... the way the bing engine is currently running, any improvement (no matter how small) is welcome 👍 |
Hmmmmm, there is definitely some strange non deterministic behavior occurring then. I'll try setting my VPN to Germany to see what's working. As for the SC, parameter, it seemed to vary in value, so I set it to a default which solved the issue for me. |
@Bnyro Good news 🥳 I was able to reproduce the issue by tunneling requests through Germany via a VPN. So I can investigate at least Germany soon. But now that I'm seeing issues depending on region.. I'm hoping I can make a patch that doesn't cause actual regressions. I'll try to test various different regions and see what sticks. My current plan of action is to...
For anyone elses info, here is the current HTTP Parameters I'm investigating. {
"GET": {
"scheme": "https",
"host": "www.bing.com",
"filename": "/search",
"query": {
"q": "{query}",
"sp": "-1",
"lq": "0",
"pq": "",
"sc": "0-0",
"qs": "n",
"sk": "",
"cvid": "{random string 32 chars long}",
"ghsh": "0",
"ghacc": "0",
"ghpl": "",
"FPIG": "{random string 32 chars long}",
"first": "{page offset}",
"FORM": "{weird bing specific page offset}"
},
}
} Seems like this Stack Overflow answer may have some insight.
Lastly, while we aren't using the Bing API, there may be some information we can cross reference? Here is a link to the REST Documentation. Also, now the page numbering is back to 1, 11, 21, ..., so who knows whats happening there. I'm going to revert those changes and assume Bing was having a moment. I don't want to cause regressions either, so less is more in that respect. However, the page numbering may be a strange combination of these other parameters, so I'll dig into this as well. Going to put this in draft - anyone is welcome to add their inputs or testing, but I'll need to do more extensive testing by region. |
Given Bing seems to be causing region-specific issues now, we should probably add documentation (whether in code or somewhere else) with major regions to check when making changes to this engine. But let me validate this first with extensive testing. |
@glanham-jr Thank you for your research, your #3416 (comment) is very interesting 👍
May I have time next weekend / but to answer your question ...
Lets document in comments and doc-strings in the engine .. the doc-strings are used here in the online documentation --> https://docs.searxng.org/dev/engines/online/bing.html |
Tried testing this last night again. Germany is still having this issue despite supplying all HTTP Parameters. From what I saw, it appears to be an issue with the headers or the cookies? I copied the exact same URL for page 2 that searxng emitted (that returned page 1 results) and tried it on a regular browser, where I got actual page 2 results. It seems there are some slightly different behaviors even from HTTP parameters / cookies / headers. I didn't take exact notes, but I can try documenting the deltas of what I am seeing between the US and Germany. |
I'm also thinking this PR partially fixes the problem for the US region, so maybe let's acknowledge this PR is a partial patch and let's dedicate a new issue specifically for Germany. |
What does this PR do?
The 'sc' parameter, whatever it means, needs to be present in order to not return the same results.
Bing page numbering doesn't increase by 10 each time. The first page returns 10 results, and all pages thereafter return 14 results. This means we need to update the page numbering to account for this. This also seems to be the case when running on searxng.
Finally, the code to check the page had some duplicate checks, so I refactored the code in this section which seems low-risk. I can undo this if we want a dedicated PR for this.
Why is this change important?
Fixes 3402
How to test this PR locally?
!bing {...}
Author's checklist
n/a
Related issues
Closes #3402
Local Testing
Page 1
Page 2
Page 3