-
-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opinions removed from the database were not removed from search engines #3967
Comments
Oof! Why do these come up first in the search results? Any idea? I don't remember why we removed content around Jan. 15th, but maybe Bill does, or maybe we can check our Slack/Github/Email logs around then? Is it possible that a |
Well, that query only filters by the default status
Yeah, it could have been anytime from January 15th until now. I reviewed the code to look for methods that remove clusters from the DB, but I didn't find anything. I'm wondering if that could have been done directly at the DB level?
I just confirmed that using a queryset like: It does trigger signals correctly. Just like doing, that also trigger signals:
|
It's...possible, but extremely unlikely. I almost never delete with SQL, because it freaks me out. Too much power and not enough language support. It sounds like we won't know the cause. Is there a way to fix this? I guess we'll have to check all of the millions of items in the index to see if they're in the DB? |
Yeah, that's the way to fix it. We can do it in batches of ~1000 items or so to avoid using too many requests. Then, in Django, filter those IDs also in batches and check which were not found and remove them from the index. |
Bleh. That sounds unpleasant, but we better do it. Let's set this as down the road though, because I want to get to alerts as soon as possible and this isn't particularly harmful to users. |
After completing #3897 and doing a check in the ES Opinions Search, I discovered something unusual: some OpinionClusters appear in the results, but clicking on the OpinionCluster trigger a 404 error. I confirmed that these have been removed from the database.
Steps to reproduce:
Here some examples:
cluster_ids
, they are indexed as well.McGuire v. Third Avenue Railroad (N.Y. App. Div. 1896)
https://www.courtlistener.com/?q=cluster_id%3A5348161&type=o&order_by=score desc&stat_Precedential=on
People v. Jordan (N.Y. App. Div. 2010)
https://www.courtlistener.com/?q=cluster_id%3A5947072&type=o&order_by=score desc&stat_Precedential=on
Goldin v. Kelly (N.Y. App. Div. 2010)
https://www.courtlistener.com/?q=cluster_id%3A5947192&type=o&order_by=score desc&stat_Precedential=on
Harrison v. Bezio (N.Y. App. Div. 2010)
https://www.courtlistener.com/?q=cluster_id%3A5948279&type=o&order_by=score desc&stat_Precedential=on
In re Clor (N.Y. App. Div. 2012)
https://www.courtlistener.com/?q=cluster_id%3A6012898&type=o&order_by=score desc&stat_Precedential=on
People v. Johns (N.Y. App. Div. 2012)
https://www.courtlistener.com/?q=cluster_id%3A6013129&type=o&order_by=score desc&stat_Precedential=on
People v. McCrae (N.Y. App. Div. 2011)
https://www.courtlistener.com/?q=cluster_id%3A5970993&type=o&order_by=score desc&stat_Precedential=on
People v. Russ (N.Y. App. Div. 2012)
https://www.courtlistener.com/?q=cluster_id%3A5990821&type=o&order_by=score desc&stat_Precedential=on
People v. Badman (N.Y. App. Div. 2012)
https://www.courtlistener.com/?q=cluster_id%3A5991871&type=o&order_by=score desc&stat_Precedential=on
In re Foley (N.Y. App. Div. 1998)
https://www.courtlistener.com/?q=cluster_id%3A6161808&type=o&order_by=score desc&stat_Precedential=on
People v. Jones (N.Y. App. Div. 1998)
https://www.courtlistener.com/?q=cluster_id%3A6163163&type=o&order_by=score desc&stat_Precedential=on
People v. Healey (N.Y. App. Div. 2000)
https://www.courtlistener.com/?q=cluster_id%3A6181359&type=o&order_by=score desc&stat_Precedential=on
In re Merante (N.Y. App. Div. 2015)
https://www.courtlistener.com/?q=cluster_id%3A6184542&type=o&order_by=score desc&stat_Precedential=on
@mlissner or @flooie Would you know what the process was for removing these clusters from the database? This way, we can identify all the IDs to remove from the Opinion Index and also consider the deletion method used so it can trigger an automatic deletion next time.
The text was updated successfully, but these errors were encountered: