Extend the citation lookup page to use the features of the API #3955

mlissner · 2024-04-08T21:53:16Z

Our new citation lookup API is really neat. It takes a blob of text, looks up the citations, and tells you what it finds:

Unfortunately, a member of our board, upon seeing it, had a good idea:

The API is cool, but we should eat our own dog food and make a user-friendly page where you paste any blob of text into the text box and it spits out everything that looked citation-like and either provides a normalized version (linking to our copy) or says CITATION NOT FOUND.

So this issue is to jot down some notes about it. Here's how the tool currently looks:

A few changes:

This should now be a textarea to encourage people to paste more.
At the bottom, it should get a second button that says "Analyze Citations"
The text at the top currently says:

If you have a citation you want to look up, put it in here, and we'll look it up.

That can change to:

Look up a citation by pasting it in or analyze all the citations in a block of text.

If somebody clicks "Look it Up" we do. Just like now.

If somebody clicks "Analyze Citations" we basically just run it against the API, either internally or via an API call if that seems worth it (I'm doubtful!).

When the response comes back, we show a page that sort of mirrors the current API response. It'll show a list of citations along with:

Was it found or was there an error? If there's an error, what was it?
What clusters were matched up to it? (Show them as nice links.)

Here's an API response I put together by hand that kind of shows the types of responses we can expect:

[
  {
    "citation": "576 U.S. 644",
    "normalized_citations": [
      "576 U.S. 644"
    ],
    "start_index": 22,
    "end_index": 34,
    "status": 200,
    "error_message": "",
    "clusters": [...one large cluster object here...]
  },
  {
    "citation": "1 U.S. 200",
    "normalized_citations": [
      "1 U.S. 200"
    ],
    "start_index": 0,
    "end_index": 10,
    "status": 404,
    "error_message": "Citation not found: '1 U.S. 200'",
    "clusters": []
  },
    ...250 citations would appear here, then the 251st and subsequent citations would be...
  {
    "citation": "576 U.S. 644",
    "normalized_citations": [
      "576 U.S. 644"
    ],
    "start_index": 10002,
    "end_index": 10013,
    "status": 429,
    "error_message": "Too many citations requested.",
    "clusters": []
  },
  {
    "citation": "576 US 644",
    "normalized_citations": [
      "576 U.S. 644"
    ],
    "start_index": 1,
    "end_index": 11,
    "status": 200,
    "error_message": "",
    "clusters": [...one cluster here...]
  },
  {
    "citation": "1 H. 150",
    "normalized_citations": [
      "1 Handy 150",
      "1 Haw. 150",
      "1 Hill 150"
    ],
    "start_index": 0,
    "end_index": 8,
    "status": 300,
    "clusters": [...two clusters here...]
  }
]

The text was updated successfully, but these errors were encountered:

mlissner · 2024-04-08T21:55:43Z

A couple design starting points...

The list of citations that /c/ returns:

(https://www.courtlistener.com/c/abb-pr/14/)

The authorities page:

(https://www.courtlistener.com/opinion/108713/roe-v-wade/authorities/)

I think example 1 looks a lot better, and that it can probably be improved further, but these are good places to begin.

mlissner · 2024-04-08T21:56:53Z

@cmaczo, if you want to spend a little time coming up with a design for this, I think that'd be great.

mlissner · 2024-04-08T22:20:45Z

A design mock up from said board member:

Pretty much what I had in mind too, but with a second button for doing a look up.

anseljh · 2024-04-08T23:51:43Z

Additional observation and questions:

If a citation is not found, that needs to be really prominent.

How about running it against a PDF or Word doc instead of a blob of pasted text? Maybe that's a premium feature, but people will probably ask for it.

In that vein...how crazy would it be to run this on everything in the RECAP Archive that looks like a motion? It would be further interesting to compare those results pre- and post-LLM era.

mlissner · 2024-04-09T00:17:25Z

Well, I'm not sure we want this to be a big priority that we put a ton of time into, but...

How about running it against a PDF or Word doc instead of a blob of pasted text? Maybe that's a premium feature, but people will probably ask for it.

Pretty easy if we've got the rest in place, honestly. We already have infra to do text extraction from documents.

In that vein...how crazy would it be to run this on everything in the RECAP Archive that looks like a motion? It would be further interesting to compare those results pre- and post-LLM era.

We've already found all the citations across all RECAP documents, but I haven't analyzed the various statuses of those citations. It'd take a few weeks if we wanted to do it. What outcome are you hoping for?

cmaczo · 2024-04-10T15:52:21Z

Working on it! Think I will have some questions for you tomorrow.

anseljh · 2024-04-10T22:58:01Z

In that vein...how crazy would it be to run this on everything in the RECAP Archive that looks like a motion? It would be further interesting to compare those results pre- and post-LLM era.

We've already found all the citations across all RECAP documents, but I haven't analyzed the various statuses of those citations. It'd take a few weeks if we wanted to do it. What outcome are you hoping for?

I am wondering if there are hallucinated citations that nobody knows about yet.

mlissner · 2024-04-10T23:03:01Z

Ah ha. Yeah, that could be quite interesting. if we did it, we'd probably learn which citations we don't have, which would be useful too (even in an ongoing way, that could be useful). I'm not sure we have anybody to do this work though, at least for the moment...

cmaczo · 2024-04-11T20:08:07Z

@mlissner You can find my first draft wireframe for the search and results page here: https://www.figma.com/file/HlrsN7b5wuAQXKMsij1tw5/Free-Law-Project?type=design&node-id=1011%3A7509&mode=design&t=HOFbpGYg91GLfDuT-1

Let's discuss during our meeting tomorrow. I am sure I have misunderstood something. :)

mlissner · 2024-04-11T21:50:04Z

Nice. A few things:

I'm usually a big fan of pagination for performance reasons, but I don't think we need it here. We can just return whatever people send us.
I think the lookup box should have a button for [Analyze Citations] or [Look up Citation], sort of like google's [Search] vs. [I'm feeling lucky]. The reason it's important to be able to still look up one citation is that that's what most people that know the tool expect from it, and they don't need the analysis page — they just want to go to a certain citation's URL directly (I'm feeling lucky).

I'm surprised you added a [Clear] button. I usually hate such buttons because of the risk that I bump into it and lose everything. You like them?
On the results page, we can provide more metadata about the case. Probably the important missing piece is the court name. Maybe the docket number too?
I think you misunderstood how the 250 lookups per request works. It's not that one citation could match 250 items, it's that you're only allowed 250 in the text you upload. After that, we parse and normalize the citation, but we don't match it to CL's.

So if you have 1,000 citations in your block of text, we'll look them all up, but the 251st and onwards won't have case data. I imagine they could be compressed somehow, so they don't all have to appear on the page individually. Something like, "We can only process 250 citations at a time. The remaining 220 citations in your upload were not processed."
We know every reporter at this point (see the huge JSON object in the reporters DB), so if it's an unknown reporter, we can just say it's invalid (as opposed to "not in our system.") We're very, very good at identifying valid citations. Painfully so.

I think that's it for this round. Want to make a few revisions and I'll take another look?

Thank you!

cmaczo · 2024-04-12T18:42:37Z

Updated: https://www.figma.com/file/HlrsN7b5wuAQXKMsij1tw5/Free-Law-Project?type=design&node-id=1011%3A7487&mode=design&t=5EvA6bLtxlG2869y-1

I'm usually a big fan of pagination for performance reasons, but I don't think we need it here. We can just return whatever people send us.

Ok.

I think the lookup box should have a button for [Analyze Citations] or [Look up Citation], sort of like google's [Search] vs. [I'm feeling lucky].

I've added the additional button, but I wonder if people will understand the difference between "Analyze citations" and "Look up citation." I'm not sure I would, just from the button titles.

I'm surprised you added a [Clear] button. I usually hate such buttons because of the risk that I bump into it and lose everything. You like them?

I don't like them, but they're usually considered best practice where someone could be pasting a huge chunk of text into a text area. For many people, it's hard to delete all that text once it's in there, so must of the UI guides say you should provide clear buttons for them.

On the results page, we can provide more metadata about the case. Probably the important missing piece is the court name. Maybe the docket number too?

Done.

So if you have 1,000 citations in your block of text, we'll look them all up, but the 251st and onwards won't have case data.

Updated with this new understanding.

I think what confused me was this part of your example API output:

{
   "citation": "576 U.S. 644",
   "normalized_citations": [
     "576 U.S. 644"
   ],
   "start_index": 10002,
   "end_index": 10013,
   "status": 429,
   "error_message": "Too many citations requested.",
   "clusters": []
 },

if it's an unknown reporter, we can just say it's invalid (as opposed to "not in our system.")

Done.

mlissner · 2024-04-12T19:41:57Z

Looks good to me. I made a few more comments in Figma, but I think it should be a solid place to begin.

I wonder if people will understand the difference between "Analyze citations" and "Look up citation." I'm not sure I would, just from the button titles.

There's actually a better way to do this, I think. If there is one citation in the text, we take the user to it. If there's more than one, we provide the page you mocked up. Simple, and that means we only need one button. Perhaps it should just say, "Analyze"?

anseljh · 2024-04-12T21:37:02Z

It would be weird to be taken to a case if you thought there might be more than one citation in your text. I’d rather see the result set of one first.

…

On Fri, Apr 12, 2024 at 12:42 PM Mike Lissner ***@***.***> wrote: Looks good to me. I made a few more comments in Figma, but I think it should be a solid place to begin. I wonder if people will understand the difference between "Analyze citations" and "Look up citation." I'm not sure I would, just from the button titles. There's actually a better way to do this, I think. If there is one citation in the text, we take the user to it. If there's more than one, we provide the page you mocked up. Simple, and that means we only need one button. Perhaps it should just say, "Analyze"? — Reply to this email directly, view it on GitHub <#3955 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAP2Y7CWGKDEEBFQ6OXNNTLY5A2JVAVCNFSM6AAAAABF5OPQMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJSGQZDGMRWGY> . You are receiving this because you commented.Message ID: ***@***.***>

mlissner · 2024-04-12T22:31:29Z

If we do that, we break compatibility with people that use this URL as an API and who expect a redirect response when their case is found, a 404 when it's not, etc. But aside from that sort-of-requirement, if there's more than one citation in the text, I think we'd find that. When would we miss that and send you to a response when there was more than one citation?

anseljh mentioned this issue Apr 11, 2024

Trawl RECAP Archive for hallucinated citations #3960

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend the citation lookup page to use the features of the API #3955

Extend the citation lookup page to use the features of the API #3955

mlissner commented Apr 8, 2024

mlissner commented Apr 8, 2024

mlissner commented Apr 8, 2024

mlissner commented Apr 8, 2024

anseljh commented Apr 8, 2024

mlissner commented Apr 9, 2024

cmaczo commented Apr 10, 2024

anseljh commented Apr 10, 2024

mlissner commented Apr 10, 2024

cmaczo commented Apr 11, 2024

mlissner commented Apr 11, 2024

cmaczo commented Apr 12, 2024

mlissner commented Apr 12, 2024

anseljh commented Apr 12, 2024 via email

mlissner commented Apr 12, 2024

Extend the citation lookup page to use the features of the API #3955

Extend the citation lookup page to use the features of the API #3955

Comments

mlissner commented Apr 8, 2024

mlissner commented Apr 8, 2024

mlissner commented Apr 8, 2024

mlissner commented Apr 8, 2024

anseljh commented Apr 8, 2024

mlissner commented Apr 9, 2024

cmaczo commented Apr 10, 2024

anseljh commented Apr 10, 2024

mlissner commented Apr 10, 2024

cmaczo commented Apr 11, 2024

mlissner commented Apr 11, 2024

cmaczo commented Apr 12, 2024

mlissner commented Apr 12, 2024

anseljh commented Apr 12, 2024 via email

mlissner commented Apr 12, 2024