Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documents belong to multiple cases; multiple cases belong to one docket (the doppelganger bug) #2185

Open
johnhawkinson opened this issue Nov 16, 2017 · 13 comments

Comments

@johnhawkinson
Copy link
Contributor

Overview

This is a long-standing issue but lately it comes up more and more for me.

• In CMECF, there is a many-to-one mapping between docket numbers and documents. A single document can belong to multiple docket numbers, as when an order is filed in two related cases.

• In CMECF, there is a many-to-one mapping between docket numbers and internal caseids (de_caseid). This is extremely common in criminal cases, where the numbers are generally contiguous. This is so when there are multiple defendants who each get a sub-case, but it is also so when there is a single defendant: there is a main case and a single subcase.

This throws a wrench in RECAP because different people will get to the same docket number via different caseid paths. Depending on what one searches for in PACER's iquery.pl and whether you choose All Defendants or single defendant or a combination thereof, you may get different (or multiple) caseids.

For instance, take 1:14-cr-10363-RGS USA v. Cadden et al in ecf.mad:
screen shot 2017-11-16 at 12 40 19

Or in XML form, query https://ecf.mad.uscourts.gov/cgi-bin/possible_case_numbers.pl?1410363 (free) to get:

<request number="1410363">
  <case number="1:14-cr-10363" id="166116" title="1:14-cr-10363-RGS USA v. Cadden et al" defendant="0" sortable="1:2014-cr-10363-RGS"/>
  <case number="1:14-cr-10363-1" id="166117" title="1:14-cr-10363-RGS-1 Barry J. Cadden (closed 06/27/2017)" defendant="1" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-2" id="166118" title="1:14-cr-10363-RGS-2 Glenn A. Chin" defendant="2" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-3" id="166119" title="1:14-cr-10363-RGS-3 Gene Svirskiy" defendant="3" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-4" id="166120" title="1:14-cr-10363-RGS-4 Christopher M. Leary" defendant="4" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-5" id="166121" title="1:14-cr-10363-RGS-5 Joseph M. Evanosky" defendant="5" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-6" id="166122" title="1:14-cr-10363-RGS-6 Scott M. Connolly" defendant="6" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-7" id="166123" title="1:14-cr-10363-RGS-7 Sharon P. Carter" defendant="7" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-8" id="166124" title="1:14-cr-10363-RGS-8 Alla V. Stepanets" defendant="8" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-9" id="166125" title="1:14-cr-10363-RGS-9 Gregory A. Conigliaro" defendant="9" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-10" id="166126" title="1:14-cr-10363-RGS-10 Robert A. Ronzio" defendant="10" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-11" id="166127" title="1:14-cr-10363-RGS-11 Kathy S. Chin (closed 10/04/2016)" defendant="11" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-12" id="166128" title="1:14-cr-10363-RGS-12 Michelle L. Thomas (closed 10/04/2016)" defendant="12" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-13" id="166129" title="1:14-cr-10363-RGS-13 Carla R. Conigliaro (closed 11/10/2016)" defendant="13" sortable="1:2014-cr-10363"/>
  <case number="1:14-cr-10363-14" id="166130" title="1:14-cr-10363-RGS-14 Douglas A. Conigliaro (closed 11/10/2016)" defendant="14" sortable="1:2014-cr-10363"/>
  <case number="1:14-cv-10363" id="157735" title="1:14-cv-10363-DPW Spencer v. Fresenius Medical Care Holdings, Inc. et al" sortable="1:2014-cv-10363-DPW"/>
</request>

All caseids from 166116-166130 refer to the same docket number. Many (most?) documents in the case belong to multiple (all?) subcases.

But RECAP and CL treat them like differenet dockets with identical docket numbers, and don't show the subcase suffix number either.

For instance the main case is
https://www.courtlistener.com/docket/4275782/united-states-v-cadden/
which has through docket entries through DE514 (Jan. 2016), and was last updated 2 months ago.

But the -1 case is https://www.courtlistener.com/docket/5135835/united-states-v-cadden/
which has through DE1260 (Oct. 24), 2017, and was last updated 12 days ago,.

But the -2 case is https://www.courtlistener.com/docket/6145187/united-states-v-cadden/ has through DE1281, but was also updated 12 days ago.

Although the -2 case is more recent, it doesn't actually have the PDF for DE1260.

So this is like a huge mess.

Single-defendant criminal cases, too

The problem even occurs for single defendant criminal cases, although the path to pain is less obvious.
Let's take our friend George Papadopoulos, in ecf.dcd. He's the sole defendant and it looks like there's only one case:

https://ecf.dcd.uscourts.gov/cgi-bin/possible_case_numbers.pl?17182

<request number="17182">
  <case number="1:17-cr-182" id="189898" title="1:17-cr-00182-RDM USA v. PAPADOPOULOS" sortable="1:2017-cr-00182"/>
  <case number="1:17-cv-182" id="184128" title="1:17-cv-00182-APM MITCHELL v. YELLEN" sortable="1:2017-cv-00182-APM"/>
</request>

So it looks like it's just 189898.
But, surprise:

https://ecf.dcd.uscourts.gov/cgi-bin/DktRpt.pl?189897 1:17-cr-00182-RDM USA v. PAPADOPOULOS
https://ecf.dcd.uscourts.gov/cgi-bin/DktRpt.pl?189898 1:17-cr-00182-RDM-1 - PAPADOPOULOS, GEORGE

The '898 is easily found in the PACER UI, but unfortunately we can't ignore the '897, because it appears in other places. For instance, the email NEF sent to parties and "interested party" ECF users yesterday:

Notice of Electronic Filing 
The following transaction was entered  on 11/15/2017 2:42 PM EDT and filed
on 11/9/2017 

Case Name: USA v. PAPADOPOULOS                                                  

Case Number: 1:17-cr-00182-RDM
https://ecf.dcd.uscourts.gov/cgi-bin/DktRpt.pl?189897

Of course, this problem is more likely to effect people who use NEFs, which is lawyers and journalists, and not too many members of the general public. But those are important RECAP constituencies.

Upshot

CL needs to track the docket number and caseid for each document independently, recognizing there can be more than one of each. For sanity's sake, CL docket pages should make the caseid visible somewhere (IA docket pages had it in the URL), even if it's small and at the bottom. Makes debugging your brain much simpler.

CL should acknowledge the concept of subdockets. I'm not sure all of what this entails. This is a nice-to-have, but not critical. If all the searches for 1:14-cr-10363 returned an amalgamation of the main docket and 14 subdockets, that would not be so bad

Maybe

Perhaps the RECAP extension should query docket number against possible_case_numbers.pl, and report to the server associated caseids. I think this is a bad idea, because it means the extension is no longer passive, it can be identified (and blocked) by the courts, and it is using a nonpublic API. Furthermore, it would not return the second caseid in the case of a single-defendant case.

Perhaps the RECAP extension should query adjacent caseids against DktRpt.pl until it runs into a different docket number on either side. Again, for the same reasons as above, I think that's bad. Also it could be many queries. I ran into a 60-defendant case last night.

Perhaps the CL server should do these queries, maybe on a one-time basis.

Mitigation

It should be straightforward to identify, in the CL database, where there are multiple caseids for a given docket number, and then take some action to combine them. This is separate but related to from what the server and extension should do about this going forward.

Discuss!

@johnhawkinson
Copy link
Contributor Author

footnote: it's also possible to run "combined docket reports" for multiple cases.
I have not tried, but I assume these do the wrong thing in RECAP (I cannot imagine them possibly doing the right thing, given the lack of subdoc support).

For instance, in Cadden, checking subcases -1 and -2 from the iquery.pl form leads you to a docket page like this: https://ecf.mad.uscourts.gov/cgi-bin/DktRpt.pl?166118;166117
which, if run, gives you either both docket reports consecutively, or if you check the "Combined docket report" checkbox, merges the two reports together. Sorted by the specified sort order.

Even more frightfully from the RECAP perspective, this combined report feature is not limited to subcases. You can enter arbitrary unrelated caseids seperated by semicolons in the URL parameter string for DktRpt.pl and other queries.

Although probably nobody does this, so it's not a big worry. But the data model should accommodate it. And I think it screws up receipts (that is, they are not reliable indicators of the case to which the document belongs, if indeed the document belongs to a single case in the combined report).

@mlissner
Copy link
Member

mlissner commented Nov 16, 2017

Looks like at least three issues here. The first issue here is that documents can belong to multiple cases. I've split that off into its own ticket: #765

@johnhawkinson
Copy link
Contributor Author

johnhawkinson commented Nov 19, 2017

This is a better description of the "doppelganger cases" issue described in freelawproject/recap#36 and freelawproject/recap#146 [Editor's note: both now closed as dups].

@johnhawkinson
Copy link
Contributor Author

For the record, subcases need not have consecutive caseids. See US v. Murgio in SDNY:

<request number="15cr769">
  <case number="1:15-cr-769" id="449632" title="1:15-cr-00769-AJN USA v. Murgio et al" defendant="0" sortable="1:2015-cr-00769-AJN"/>
  <case number="1:15-cr-769-1" id="449633" title="1:15-cr-00769-AJN-1 Anthony R. Murgio (closed 10/25/2017)" defendant="1" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-2" id="450676" title="1:15-cr-00769-AJN-2 Yuri Lebedev (closed 11/01/2017)" defendant="2" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-3" id="454366" title="1:15-cr-00769-AJN-3 Trevon Gross (closed 11/16/2017)" defendant="3" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-4" id="456495" title="1:15-cr-00769-AJN-4 Michael J. Murgio (closed 01/30/2017)" defendant="4" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-5" id="464041" title="1:15-cr-00769-AJN-5 Jose M Freundt" defendant="5" sortable="1:2015-cr-00769"/>
  <case number="1:15-cr-769-6" id="467688" title="1:15-cr-00769-AJN-6 Ricardo Hill" defendant="6" sortable="1:2015-cr-00769"/>
</request>

@mlissner
Copy link
Member

Grr, automated commit message thing. This is not fixed.

@mlissner
Copy link
Member

I confess I'm still not sure how to proceed here. When we have multiple pacer case IDs, are those IDs just a different view into the same docket or are they actually different dockets altogether? In some form, we need to link all these dockets together under one umbrella, like PACER does, but I don't understand what PACER is accomplishing with these well enough to understand how to do it in our UI.

@danieldjewell
Copy link

danieldjewell commented Apr 29, 2019

@mlissner I see the dilemma and yes, I think this is what I was running into with freelawproject/recap#267 ...

I've been thinking about it and, ultimately, I think @johnhawkinson hit on probably the best solution -- CL needs an additional layer that knows about the sub-dockets as that seems to be at the core. As noted, it seems that documents can belong to multiple sub-dockets simultaneously but those sub-dockets might also have their own unique items.

Right now, as described, CL treats these individual sub-dockets as separate cases in the database (e.g. they get their own CL docket ID number because the pacer docket ID is different). This results in the behavior we're seeing here. Instead, if there's a "related dockets" table, this data could be broken out and then separated.

Consider 2 situations:

Simple Docket

Use existing systems, no sub/related cases. Things function as normal.

Complex Docket

Sub-Dockets or related cases - Need to maintain information that allows for correlation of related dockets.

High Level Overview

New DB table that indicates the relationships between the dockets and, if needed, storage of any "master" information. Extend existing docket table to include a column indicating the "master" docket entry. (If null or 0, no master docket - or something similar). Keep maintaining separate CL entries for each sub-docket (because that matches how PACER works) but if there are multiple related entries, display a "master" page indicating all related dockets.

Correlating the cases could be accomplished in multiple ways - some more accurate/reliable than others:

  • Data from PACER itself as pointed out by @johnhawkinson
  • Offline/Back-end correlation between documents (e.g. if a document appears in one docket and the exact same document appears in another... SHA1 hashes of the documents are already available in the DB, could query on this)
  • Parsing the case number more ? (also, perhaps, tracking changes to the case number? this one is complicated)
  • The RECAP extension demonstrates an interesting method -- when viewing a docket report on PACER, the extension queries the RECAP API for document availability (so that it can display that nice "R" icon next to the document links to indicate that the document exists in RECAP). From my research, this query utilizes only 2 parameters: the court abbreviation and the PACER document ID. So, in that sense, that should make a unique key: if a (court,pacer_doc_id) pair is found in multiple docket reports, it seems (??) that you could make the inference/conclusion that in whichever docket reports that is found, the cases are related?

UI Ideas

I'm not 100% sure how best to implement searching -- but for related dockets, display 1 entry in the search results leading to the master docket page that then shows the sub-related dockets. (Similar to how the PACER query looks that @johnhawkinson posted). On each sub-docket page, at a minimum, display a link back to the parent master docket - preferably a sidebar listing of related dockets.

Closing Thoughts / Caveats

I have not spent nearly enough time to fully understand the existing database structure and the inner workings (gotta pay the bills, we all know how that one goes) - there might be something I'm missing here but it seems that the only way to fix this is to give CL/RECAP the ability to know that dockets could be related to each other. I think that probably adding an additional DB layer to store that relationship information would enable this to be resolved once and for all.

Consider this code:

d, docket_count = find_docket_object(pq.court_id, pq.pacer_case_id,
data['docket_number'])
if docket_count > 1:
logger.info("Found %s dockets during lookup. Choosing oldest." %
docket_count)
d = d.earliest('date_created')

If I'm reading this correctly, )a) the current method for dealing with duplicate dockets is to update the oldest and (b) the system differentiates between CL dockets on the basis of the PACER case ID/docket number... which can be different for related cases.

@johnhawkinson
Copy link
Contributor Author

johnhawkinson commented Apr 29, 2019

I think @johnhawkinson hit on probably the best solution

I thought that was undisputed :).

Correlating the cases could be accomplished in multiple ways - some more accurate/reliable than others:

Well, normally speaking all the case numbers in this situation are consecutive, so that's a huge win.

Offline/Back-end correlation between documents (e.g. if a document appears in one docket and the exact same document appears in another... SHA1 hashes of the documents are already available in the DB, could query on this)

Well. Please don't use the term "related" for multiple subdockets of the same master criminal docket. We use the term "related" to refer to a different kind of relationship between cases, like where I file a civil action a year after you did while yours is still pending and they address common issues of law but joinder may not be appropriate, so I mark my case as related to yours and they are typically assigned to the same judge for reasons of judicial economy (varies district-to-district). Or similarly in an MDL context. This usage is important because:

Court staff have the ability to file a CMECF document in multiple cases, and those cases will all refer to the same docket number. The cases need not have a subcase relationship. It is typically the case that this happens in related cases, though, using the "related" meaning that I have explained above.

Parsing the case number more ? (also, perhaps, tracking changes to the case number? this one is complicated)

I'm not entirely sure what you mean by this.

The RECAP extension demonstrates an interesting method -- when viewing a docket report on PACER, the extension queries the RECAP API for document availability (so that it can display that nice "R" icon next to the document links to indicate that the document exists in RECAP). From my research, this query utilizes only 2 parameters: the court abbreviation and the PACER document ID. So, in that sense, that should make a unique key: if a (court,pacer_doc_id) pair is found in multiple docket reports, it seems (??) that you could make the inference/conclusion that in whichever docket reports that is found, the cases are related?

See above. They are likely to be related (but possibly not; say a judge gets sick and the chief judge dockets a stay/postponement order in all of his active cases with calendar dates in the next week), but not necessarily with a subdocket relationship.

@bishwashere
Copy link

The data model at https://www.courtlistener.com/api/rest-info/ can have this change to begin with:

RECAPDocument table cannot have "docket_entry". More than one case (and therefore dockets) can refer the same document. This is not only common in criminal cases, but in any case. Therefore, DocketEntry table must keep the reference of the document instead.

@danieldjewell
Copy link

I'm not sure if there's a separate issue on this but: The problem of (what appears to be) a single docket in PACER turning into multiple dockets/cases on RECAP is still a major issue. See: https://www.courtlistener.com/?type=r&q=&type=r&order_by=score%20desc&docket_number=2%3A18-cr-00422&court=azd

I need to do more digging but it appears that all 8 of these RECAP dockets will lead to the same PACER docket report (when using the "View on PACER" blue header button).

More interestingly/concerningly, documents are being uploaded and associated, but not always with the same RECAP docket. Further, the RECAP extension appears to be able to find the document availability in RECAP without an issue... (when viewing the "do you want to buy this document" page in PACER)

I remember there being a discussion about how a (supposedly single) PACER docket could somehow turn into multiple RECAP dockets. Regardless, this is becoming a bigger and bigger issue.

I need to look a bit more at the 8 different RECAP dockets in the search link above but it does appear that there are documents that are associated with only one of the RECAP dockets. (In other words, there are unique documents in each RECAP docket.)

From a data accuracy/integrity standpoint, this is kinda messy. Perhaps solving the creation of multiple dockets in RECAP is unnecessary - perhaps the solution is to make the links work in every RECAP docket? (assuming there's something in the database that would associate the multiple RECAP dockets)

@johnhawkinson
Copy link
Contributor Author

I believe this is the proper issue, @danieldjewell. The case you cite, USA v. Lacey, is expected to have 8 RECAP dockets, since there are 7 criminal subcases plus the master case:

2:18-cr-00422-SMB USA v. Lacey et al -
2:18-cr-00422-SMB-1 Michael Lacey
2:18-cr-00422-SMB-2 James Larkin
2:18-cr-00422-SMB-3 Scott Spear
2:18-cr-00422-SMB-4 John Brunst
2:18-cr-00422-SMB-5 Dan Hyer
2:18-cr-00422-SMB-6 Andrew Padilla
2:18-cr-00422-SMB-7 Joye Vaught

I do think it's a correct observation that the CourtListener docket report should stop searching by case number and document ID and merely search by document ID, and that would remove some of the pain, at least where the docket report had been run.

But this problem calls out for more serious attention than it has gotten, since basically "RECAP is unusable for criminal cases" is where it shakes out, and that just sucks.

@hughbe
Copy link

hughbe commented Jul 28, 2022

I raised the issue #2181, cited above. I’m wondering about how this problem can be fixed. Would it be possible to merge identical dockets? Or for example to make a request to do so?

@mlissner mlissner changed the title Documents belong to multiple cases; multiple cases belong to one docket. Documents belong to multiple cases; multiple cases belong to one docket (the doppelganger bug) Apr 24, 2023
@GammaGames
Copy link

FYI this issue was referenced on the Law SE site: Why are there two case numbers for United States v. Trump?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants