Documents belong to multiple cases; multiple cases belong to one docket (the doppelganger bug) #2185

johnhawkinson · 2017-11-16T18:09:34Z

Overview

This is a long-standing issue but lately it comes up more and more for me.

• In CMECF, there is a many-to-one mapping between docket numbers and documents. A single document can belong to multiple docket numbers, as when an order is filed in two related cases.

• In CMECF, there is a many-to-one mapping between docket numbers and internal caseids (de_caseid). This is extremely common in criminal cases, where the numbers are generally contiguous. This is so when there are multiple defendants who each get a sub-case, but it is also so when there is a single defendant: there is a main case and a single subcase.

This throws a wrench in RECAP because different people will get to the same docket number via different caseid paths. Depending on what one searches for in PACER's iquery.pl and whether you choose All Defendants or single defendant or a combination thereof, you may get different (or multiple) caseids.

For instance, take 1:14-cr-10363-RGS USA v. Cadden et al in ecf.mad:

Or in XML form, query https://ecf.mad.uscourts.gov/cgi-bin/possible_case_numbers.pl?1410363 (free) to get:

<request number="1410363">
  <case number="1:14-cr-10363" id="166116" title="1:14-cr-10363-RGS USA v. Cadden et al" defendant="0" sortable="1:2014-cr-10363-RGS"/>
  <case number="1:14-cr-10363-1" id="166117" title="1:14-cr-10363-RGS-1 Barry J. Cadden (closed 06/27/2017)" defendant="1" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-2" id="166118" title="1:14-cr-10363-RGS-2 Glenn A. Chin" defendant="2" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-3" id="166119" title="1:14-cr-10363-RGS-3 Gene Svirskiy" defendant="3" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-4" id="166120" title="1:14-cr-10363-RGS-4 Christopher M. Leary" defendant="4" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-5" id="166121" title="1:14-cr-10363-RGS-5 Joseph M. Evanosky" defendant="5" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-6" id="166122" title="1:14-cr-10363-RGS-6 Scott M. Connolly" defendant="6" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-7" id="166123" title="1:14-cr-10363-RGS-7 Sharon P. Carter" defendant="7" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-8" id="166124" title="1:14-cr-10363-RGS-8 Alla V. Stepanets" defendant="8" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-9" id="166125" title="1:14-cr-10363-RGS-9 Gregory A. Conigliaro" defendant="9" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-10" id="166126" title="1:14-cr-10363-RGS-10 Robert A. Ronzio" defendant="10" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-11" id="166127" title="1:14-cr-10363-RGS-11 Kathy S. Chin (closed 10/04/2016)" defendant="11" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-12" id="166128" title="1:14-cr-10363-RGS-12 Michelle L. Thomas (closed 10/04/2016)" defendant="12" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-13" id="166129" title="1:14-cr-10363-RGS-13 Carla R. Conigliaro (closed 11/10/2016)" defendant="13" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-14" id="166130" title="1:14-cr-10363-RGS-14 Douglas A. Conigliaro (closed 11/10/2016)" defendant="14" sortable="1:2014-cr-10363"/>
  <case number="1:14-cv-10363" id="157735" title="1:14-cv-10363-DPW Spencer v. Fresenius Medical Care Holdings, Inc. et al" sortable="1:2014-cv-10363-DPW"/>
</request>

All caseids from 166116-166130 refer to the same docket number. Many (most?) documents in the case belong to multiple (all?) subcases.

But RECAP and CL treat them like differenet dockets with identical docket numbers, and don't show the subcase suffix number either.

For instance the main case is
https://www.courtlistener.com/docket/4275782/united-states-v-cadden/
which has through docket entries through DE514 (Jan. 2016), and was last updated 2 months ago.

But the -1 case is https://www.courtlistener.com/docket/5135835/united-states-v-cadden/
which has through DE1260 (Oct. 24), 2017, and was last updated 12 days ago,.

But the -2 case is https://www.courtlistener.com/docket/6145187/united-states-v-cadden/ has through DE1281, but was also updated 12 days ago.

Although the -2 case is more recent, it doesn't actually have the PDF for DE1260.

So this is like a huge mess.

Single-defendant criminal cases, too

The problem even occurs for single defendant criminal cases, although the path to pain is less obvious.
Let's take our friend George Papadopoulos, in ecf.dcd. He's the sole defendant and it looks like there's only one case:

https://ecf.dcd.uscourts.gov/cgi-bin/possible_case_numbers.pl?17182

<request number="17182">
  <case number="1:17-cr-182" id="189898" title="1:17-cr-00182-RDM USA v. PAPADOPOULOS" sortable="1:2017-cr-00182"/>
  <case number="1:17-cv-182" id="184128" title="1:17-cv-00182-APM MITCHELL v. YELLEN" sortable="1:2017-cv-00182-APM"/>
</request>

So it looks like it's just 189898.
But, surprise:

https://ecf.dcd.uscourts.gov/cgi-bin/DktRpt.pl?189897 1:17-cr-00182-RDM USA v. PAPADOPOULOS
https://ecf.dcd.uscourts.gov/cgi-bin/DktRpt.pl?189898 1:17-cr-00182-RDM-1 - PAPADOPOULOS, GEORGE

The '898 is easily found in the PACER UI, but unfortunately we can't ignore the '897, because it appears in other places. For instance, the email NEF sent to parties and "interested party" ECF users yesterday:

Notice of Electronic Filing 
The following transaction was entered  on 11/15/2017 2:42 PM EDT and filed
on 11/9/2017 

Case Name: USA v. PAPADOPOULOS                                                  

Case Number: 1:17-cr-00182-RDM
https://ecf.dcd.uscourts.gov/cgi-bin/DktRpt.pl?189897

Of course, this problem is more likely to effect people who use NEFs, which is lawyers and journalists, and not too many members of the general public. But those are important RECAP constituencies.

Upshot

CL needs to track the docket number and caseid for each document independently, recognizing there can be more than one of each. For sanity's sake, CL docket pages should make the caseid visible somewhere (IA docket pages had it in the URL), even if it's small and at the bottom. Makes debugging your brain much simpler.

CL should acknowledge the concept of subdockets. I'm not sure all of what this entails. This is a nice-to-have, but not critical. If all the searches for 1:14-cr-10363 returned an amalgamation of the main docket and 14 subdockets, that would not be so bad

Maybe

Perhaps the RECAP extension should query docket number against possible_case_numbers.pl, and report to the server associated caseids. I think this is a bad idea, because it means the extension is no longer passive, it can be identified (and blocked) by the courts, and it is using a nonpublic API. Furthermore, it would not return the second caseid in the case of a single-defendant case.

Perhaps the RECAP extension should query adjacent caseids against DktRpt.pl until it runs into a different docket number on either side. Again, for the same reasons as above, I think that's bad. Also it could be many queries. I ran into a 60-defendant case last night.

Perhaps the CL server should do these queries, maybe on a one-time basis.

Mitigation

It should be straightforward to identify, in the CL database, where there are multiple caseids for a given docket number, and then take some action to combine them. This is separate but related to from what the server and extension should do about this going forward.

Discuss!

The text was updated successfully, but these errors were encountered:

johnhawkinson · 2017-11-16T18:18:39Z

footnote: it's also possible to run "combined docket reports" for multiple cases.
I have not tried, but I assume these do the wrong thing in RECAP (I cannot imagine them possibly doing the right thing, given the lack of subdoc support).

For instance, in Cadden, checking subcases -1 and -2 from the iquery.pl form leads you to a docket page like this: https://ecf.mad.uscourts.gov/cgi-bin/DktRpt.pl?166118;166117
which, if run, gives you either both docket reports consecutively, or if you check the "Combined docket report" checkbox, merges the two reports together. Sorted by the specified sort order.

Even more frightfully from the RECAP perspective, this combined report feature is not limited to subcases. You can enter arbitrary unrelated caseids seperated by semicolons in the URL parameter string for DktRpt.pl and other queries.

Although probably nobody does this, so it's not a big worry. But the data model should accommodate it. And I think it screws up receipts (that is, they are not reliable indicators of the case to which the document belongs, if indeed the document belongs to a single case in the combined report).

mlissner · 2017-11-16T20:14:39Z

Looks like at least three issues here. The first issue here is that documents can belong to multiple cases. I've split that off into its own ticket: #765

johnhawkinson · 2017-11-19T01:50:18Z

This is a better description of the "doppelganger cases" issue described in freelawproject/recap#36 and freelawproject/recap#146 [Editor's note: both now closed as dups].

johnhawkinson · 2017-11-19T05:01:44Z

For the record, subcases need not have consecutive caseids. See US v. Murgio in SDNY:

<request number="15cr769">
  <case number="1:15-cr-769" id="449632" title="1:15-cr-00769-AJN USA v. Murgio et al" defendant="0" sortable="1:2015-cr-00769-AJN"/>
  <case number="1:15-cr-769-1" id="449633" title="1:15-cr-00769-AJN-1 Anthony R. Murgio (closed 10/25/2017)" defendant="1" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-2" id="450676" title="1:15-cr-00769-AJN-2 Yuri Lebedev (closed 11/01/2017)" defendant="2" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-3" id="454366" title="1:15-cr-00769-AJN-3 Trevon Gross (closed 11/16/2017)" defendant="3" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-4" id="456495" title="1:15-cr-00769-AJN-4 Michael J. Murgio (closed 01/30/2017)" defendant="4" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-5" id="464041" title="1:15-cr-00769-AJN-5 Jose M Freundt" defendant="5" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-6" id="467688" title="1:15-cr-00769-AJN-6 Ricardo Hill" defendant="6" sortable="1:2015-cr-00769"/>
</request>

mlissner · 2017-12-29T23:10:47Z

Grr, automated commit message thing. This is not fixed.

mlissner · 2019-02-18T23:20:32Z

I confess I'm still not sure how to proceed here. When we have multiple pacer case IDs, are those IDs just a different view into the same docket or are they actually different dockets altogether? In some form, we need to link all these dockets together under one umbrella, like PACER does, but I don't understand what PACER is accomplishing with these well enough to understand how to do it in our UI.

danieldjewell · 2019-04-29T20:20:56Z

@mlissner I see the dilemma and yes, I think this is what I was running into with freelawproject/recap#267 ...

I've been thinking about it and, ultimately, I think @johnhawkinson hit on probably the best solution -- CL needs an additional layer that knows about the sub-dockets as that seems to be at the core. As noted, it seems that documents can belong to multiple sub-dockets simultaneously but those sub-dockets might also have their own unique items.

Right now, as described, CL treats these individual sub-dockets as separate cases in the database (e.g. they get their own CL docket ID number because the pacer docket ID is different). This results in the behavior we're seeing here. Instead, if there's a "related dockets" table, this data could be broken out and then separated.

Consider 2 situations:

Simple Docket

Use existing systems, no sub/related cases. Things function as normal.

Complex Docket

Sub-Dockets or related cases - Need to maintain information that allows for correlation of related dockets.

High Level Overview

New DB table that indicates the relationships between the dockets and, if needed, storage of any "master" information. Extend existing docket table to include a column indicating the "master" docket entry. (If null or 0, no master docket - or something similar). Keep maintaining separate CL entries for each sub-docket (because that matches how PACER works) but if there are multiple related entries, display a "master" page indicating all related dockets.

Correlating the cases could be accomplished in multiple ways - some more accurate/reliable than others:

Data from PACER itself as pointed out by @johnhawkinson
Offline/Back-end correlation between documents (e.g. if a document appears in one docket and the exact same document appears in another... SHA1 hashes of the documents are already available in the DB, could query on this)
Parsing the case number more ? (also, perhaps, tracking changes to the case number? this one is complicated)
The RECAP extension demonstrates an interesting method -- when viewing a docket report on PACER, the extension queries the RECAP API for document availability (so that it can display that nice "R" icon next to the document links to indicate that the document exists in RECAP). From my research, this query utilizes only 2 parameters: the court abbreviation and the PACER document ID. So, in that sense, that should make a unique key: if a (court,pacer_doc_id) pair is found in multiple docket reports, it seems (??) that you could make the inference/conclusion that in whichever docket reports that is found, the cases are related?

UI Ideas

I'm not 100% sure how best to implement searching -- but for related dockets, display 1 entry in the search results leading to the master docket page that then shows the sub-related dockets. (Similar to how the PACER query looks that @johnhawkinson posted). On each sub-docket page, at a minimum, display a link back to the parent master docket - preferably a sidebar listing of related dockets.

Closing Thoughts / Caveats

I have not spent nearly enough time to fully understand the existing database structure and the inner workings (gotta pay the bills, we all know how that one goes) - there might be something I'm missing here but it seems that the only way to fix this is to give CL/RECAP the ability to know that dockets could be related to each other. I think that probably adding an additional DB layer to store that relationship information would enable this to be resolved once and for all.

Consider this code:

courtlistener/cl/recap/tasks.py

Lines 1355 to 1360 in 19b215c

    
           d, docket_count = find_docket_object(pq.court_id, pq.pacer_case_id, 
        
                                                data['docket_number']) 
        
           if docket_count > 1: 
        
               logger.info("Found %s dockets during lookup. Choosing oldest." % 
        
                           docket_count) 
        
               d = d.earliest('date_created')

If I'm reading this correctly, )a) the current method for dealing with duplicate dockets is to update the oldest and (b) the system differentiates between CL dockets on the basis of the PACER case ID/docket number... which can be different for related cases.

johnhawkinson · 2019-04-29T20:54:45Z

I think @johnhawkinson hit on probably the best solution

I thought that was undisputed :).

Correlating the cases could be accomplished in multiple ways - some more accurate/reliable than others:

Well, normally speaking all the case numbers in this situation are consecutive, so that's a huge win.

Offline/Back-end correlation between documents (e.g. if a document appears in one docket and the exact same document appears in another... SHA1 hashes of the documents are already available in the DB, could query on this)

Well. Please don't use the term "related" for multiple subdockets of the same master criminal docket. We use the term "related" to refer to a different kind of relationship between cases, like where I file a civil action a year after you did while yours is still pending and they address common issues of law but joinder may not be appropriate, so I mark my case as related to yours and they are typically assigned to the same judge for reasons of judicial economy (varies district-to-district). Or similarly in an MDL context. This usage is important because:

Court staff have the ability to file a CMECF document in multiple cases, and those cases will all refer to the same docket number. The cases need not have a subcase relationship. It is typically the case that this happens in related cases, though, using the "related" meaning that I have explained above.

Parsing the case number more ? (also, perhaps, tracking changes to the case number? this one is complicated)

I'm not entirely sure what you mean by this.

The RECAP extension demonstrates an interesting method -- when viewing a docket report on PACER, the extension queries the RECAP API for document availability (so that it can display that nice "R" icon next to the document links to indicate that the document exists in RECAP). From my research, this query utilizes only 2 parameters: the court abbreviation and the PACER document ID. So, in that sense, that should make a unique key: if a (court,pacer_doc_id) pair is found in multiple docket reports, it seems (??) that you could make the inference/conclusion that in whichever docket reports that is found, the cases are related?

See above. They are likely to be related (but possibly not; say a judge gets sick and the chief judge dockets a stay/postponement order in all of his active cases with calendar dates in the next week), but not necessarily with a subdocket relationship.

bishwashere · 2021-01-27T20:29:35Z

The data model at https://www.courtlistener.com/api/rest-info/ can have this change to begin with:

RECAPDocument table cannot have "docket_entry". More than one case (and therefore dockets) can refer the same document. This is not only common in criminal cases, but in any case. Therefore, DocketEntry table must keep the reference of the document instead.

danieldjewell · 2021-10-06T20:01:09Z

I'm not sure if there's a separate issue on this but: The problem of (what appears to be) a single docket in PACER turning into multiple dockets/cases on RECAP is still a major issue. See: https://www.courtlistener.com/?type=r&q=&type=r&order_by=score%20desc&docket_number=2%3A18-cr-00422&court=azd

I need to do more digging but it appears that all 8 of these RECAP dockets will lead to the same PACER docket report (when using the "View on PACER" blue header button).

More interestingly/concerningly, documents are being uploaded and associated, but not always with the same RECAP docket. Further, the RECAP extension appears to be able to find the document availability in RECAP without an issue... (when viewing the "do you want to buy this document" page in PACER)

I remember there being a discussion about how a (supposedly single) PACER docket could somehow turn into multiple RECAP dockets. Regardless, this is becoming a bigger and bigger issue.

I need to look a bit more at the 8 different RECAP dockets in the search link above but it does appear that there are documents that are associated with only one of the RECAP dockets. (In other words, there are unique documents in each RECAP docket.)

From a data accuracy/integrity standpoint, this is kinda messy. Perhaps solving the creation of multiple dockets in RECAP is unnecessary - perhaps the solution is to make the links work in every RECAP docket? (assuming there's something in the database that would associate the multiple RECAP dockets)

johnhawkinson · 2021-10-06T20:40:44Z

I believe this is the proper issue, @danieldjewell. The case you cite, USA v. Lacey, is expected to have 8 RECAP dockets, since there are 7 criminal subcases plus the master case:

2:18-cr-00422-SMB USA v. Lacey et al -
2:18-cr-00422-SMB-1 Michael Lacey
2:18-cr-00422-SMB-2 James Larkin
2:18-cr-00422-SMB-3 Scott Spear
2:18-cr-00422-SMB-4 John Brunst
2:18-cr-00422-SMB-5 Dan Hyer
2:18-cr-00422-SMB-6 Andrew Padilla
2:18-cr-00422-SMB-7 Joye Vaught

I do think it's a correct observation that the CourtListener docket report should stop searching by case number and document ID and merely search by document ID, and that would remove some of the pain, at least where the docket report had been run.

But this problem calls out for more serious attention than it has gotten, since basically "RECAP is unusable for criminal cases" is where it shakes out, and that just sucks.

hughbe · 2022-07-28T11:35:55Z

I raised the issue #2181, cited above. I’m wondering about how this problem can be fixed. Would it be possible to merge identical dockets? Or for example to make a request to do so?

GammaGames · 2023-08-24T16:37:49Z

FYI this issue was referenced on the Law SE site: Why are there two case numbers for United States v. Trump?

johnhawkinson mentioned this issue Nov 16, 2017

RECAP docket number search doesn't allow suffixes #764

Closed

mlissner mentioned this issue Nov 16, 2017

PACER documents can belong to multiple cases #765

Closed

johnhawkinson mentioned this issue Nov 17, 2017

CL rejects RECAP uploads for multi-case dockets #767

Closed

johnhawkinson mentioned this issue Nov 22, 2017

PACER parser doesn't get past certain docket entry #762

Closed

mlissner mentioned this issue Dec 12, 2017

Doppelganger cases with no available documents (was: "ECF docket number parsed wrong in Swartz") freelawproject/recap#36

Closed

mlissner closed this as completed in 3a6b69b Dec 29, 2017

mlissner reopened this Dec 29, 2017

johnhawkinson mentioned this issue Jan 3, 2018

Manafort/Gates doppelgangers lead to bad PDF links from CL #785

Open

This was referenced Feb 18, 2019

Multi-defendant criminal case confusion / split destinations freelawproject/recap#41

Closed

Docket Data Missing freelawproject/recap#267

Closed

johnhawkinson mentioned this issue Jan 20, 2022

A Large Docket Not Working freelawproject/recap#303

Closed

mlissner transferred this issue from freelawproject/recap Jul 20, 2022

mlissner mentioned this issue Jul 20, 2022

Multiple docket listings for the same case #2181

Closed

mlissner changed the title ~~Documents belong to multiple cases; multiple cases belong to one docket.~~ Documents belong to multiple cases; multiple cases belong to one docket (the doppelganger bug) Apr 24, 2023

mlissner mentioned this issue Apr 24, 2023

Documents in RECAP not showing in Docket list #2668

Closed

eleenest mentioned this issue Feb 1, 2024

CL document links incorrectly point to the wrong document (same docket number, but on a "related" case) #3715

Open

mlissner pinned this issue Feb 6, 2024

v-anne mentioned this issue Mar 30, 2024

Issues with consolidated dockets at circuit court level #3931

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documents belong to multiple cases; multiple cases belong to one docket (the doppelganger bug) #2185

Documents belong to multiple cases; multiple cases belong to one docket (the doppelganger bug) #2185

johnhawkinson commented Nov 16, 2017

johnhawkinson commented Nov 16, 2017

mlissner commented Nov 16, 2017 •

edited

johnhawkinson commented Nov 19, 2017 •

edited by mlissner

johnhawkinson commented Nov 19, 2017

mlissner commented Dec 29, 2017

mlissner commented Feb 18, 2019

danieldjewell commented Apr 29, 2019 •

edited

johnhawkinson commented Apr 29, 2019 •

edited

bishwashere commented Jan 27, 2021

danieldjewell commented Oct 6, 2021

johnhawkinson commented Oct 6, 2021

hughbe commented Jul 28, 2022

GammaGames commented Aug 24, 2023

Documents belong to multiple cases; multiple cases belong to one docket (the doppelganger bug) #2185

Documents belong to multiple cases; multiple cases belong to one docket (the doppelganger bug) #2185

Comments

johnhawkinson commented Nov 16, 2017

Overview

Single-defendant criminal cases, too

Upshot

Maybe

Mitigation

johnhawkinson commented Nov 16, 2017

mlissner commented Nov 16, 2017 • edited

johnhawkinson commented Nov 19, 2017 • edited by mlissner

johnhawkinson commented Nov 19, 2017

mlissner commented Dec 29, 2017

mlissner commented Feb 18, 2019

danieldjewell commented Apr 29, 2019 • edited

Simple Docket

Complex Docket

High Level Overview

UI Ideas

Closing Thoughts / Caveats

johnhawkinson commented Apr 29, 2019 • edited

bishwashere commented Jan 27, 2021

danieldjewell commented Oct 6, 2021

johnhawkinson commented Oct 6, 2021

hughbe commented Jul 28, 2022

GammaGames commented Aug 24, 2023

mlissner commented Nov 16, 2017 •

edited

johnhawkinson commented Nov 19, 2017 •

edited by mlissner

danieldjewell commented Apr 29, 2019 •

edited

johnhawkinson commented Apr 29, 2019 •

edited