Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the citation lookup page to use the features of the API #3955

Open
mlissner opened this issue Apr 8, 2024 · 14 comments
Open

Extend the citation lookup page to use the features of the API #3955

mlissner opened this issue Apr 8, 2024 · 14 comments

Comments

@mlissner
Copy link
Member

mlissner commented Apr 8, 2024

Our new citation lookup API is really neat. It takes a blob of text, looks up the citations, and tells you what it finds:

image

Unfortunately, a member of our board, upon seeing it, had a good idea:

The API is cool, but we should eat our own dog food and make a user-friendly page where you paste any blob of text into the text box and it spits out everything that looked citation-like and either provides a normalized version (linking to our copy) or says CITATION NOT FOUND.

So this issue is to jot down some notes about it. Here's how the tool currently looks:

image

A few changes:

  1. This should now be a textarea to encourage people to paste more.

  2. At the bottom, it should get a second button that says "Analyze Citations"

  3. The text at the top currently says:

    If you have a citation you want to look up, put it in here, and we'll look it up.

    That can change to:

    Look up a citation by pasting it in or analyze all the citations in a block of text.

If somebody clicks "Look it Up" we do. Just like now.

If somebody clicks "Analyze Citations" we basically just run it against the API, either internally or via an API call if that seems worth it (I'm doubtful!).

When the response comes back, we show a page that sort of mirrors the current API response. It'll show a list of citations along with:

  1. Was it found or was there an error? If there's an error, what was it?
  2. What clusters were matched up to it? (Show them as nice links.)

Here's an API response I put together by hand that kind of shows the types of responses we can expect:

[
  {
    "citation": "576 U.S. 644",
    "normalized_citations": [
      "576 U.S. 644"
    ],
    "start_index": 22,
    "end_index": 34,
    "status": 200,
    "error_message": "",
    "clusters": [...one large cluster object here...]
  },
  {
    "citation": "1 U.S. 200",
    "normalized_citations": [
      "1 U.S. 200"
    ],
    "start_index": 0,
    "end_index": 10,
    "status": 404,
    "error_message": "Citation not found: '1 U.S. 200'",
    "clusters": []
  },
    ...250 citations would appear here, then the 251st and subsequent citations would be...
  {
    "citation": "576 U.S. 644",
    "normalized_citations": [
      "576 U.S. 644"
    ],
    "start_index": 10002,
    "end_index": 10013,
    "status": 429,
    "error_message": "Too many citations requested.",
    "clusters": []
  },
  {
    "citation": "576 US 644",
    "normalized_citations": [
      "576 U.S. 644"
    ],
    "start_index": 1,
    "end_index": 11,
    "status": 200,
    "error_message": "",
    "clusters": [...one cluster here...]
  },
  {
    "citation": "1 H. 150",
    "normalized_citations": [
      "1 Handy 150",
      "1 Haw. 150",
      "1 Hill 150"
    ],
    "start_index": 0,
    "end_index": 8,
    "status": 300,
    "clusters": [...two clusters here...]
  }
]
@mlissner
Copy link
Member Author

mlissner commented Apr 8, 2024

A couple design starting points...

  1. The list of citations that /c/ returns:

image

(https://www.courtlistener.com/c/abb-pr/14/)

  1. The authorities page:

image

(https://www.courtlistener.com/opinion/108713/roe-v-wade/authorities/)

I think example 1 looks a lot better, and that it can probably be improved further, but these are good places to begin.

@mlissner
Copy link
Member Author

mlissner commented Apr 8, 2024

@cmaczo, if you want to spend a little time coming up with a design for this, I think that'd be great.

@mlissner
Copy link
Member Author

mlissner commented Apr 8, 2024

A design mock up from said board member:

image

Pretty much what I had in mind too, but with a second button for doing a look up.

@anseljh
Copy link
Member

anseljh commented Apr 8, 2024

Additional observation and questions:

If a citation is not found, that needs to be really prominent.

How about running it against a PDF or Word doc instead of a blob of pasted text? Maybe that's a premium feature, but people will probably ask for it.

In that vein...how crazy would it be to run this on everything in the RECAP Archive that looks like a motion? It would be further interesting to compare those results pre- and post-LLM era.

@mlissner
Copy link
Member Author

mlissner commented Apr 9, 2024

Well, I'm not sure we want this to be a big priority that we put a ton of time into, but...

How about running it against a PDF or Word doc instead of a blob of pasted text? Maybe that's a premium feature, but people will probably ask for it.

Pretty easy if we've got the rest in place, honestly. We already have infra to do text extraction from documents.

In that vein...how crazy would it be to run this on everything in the RECAP Archive that looks like a motion? It would be further interesting to compare those results pre- and post-LLM era.

We've already found all the citations across all RECAP documents, but I haven't analyzed the various statuses of those citations. It'd take a few weeks if we wanted to do it. What outcome are you hoping for?

@cmaczo
Copy link

cmaczo commented Apr 10, 2024

Working on it! Think I will have some questions for you tomorrow.

@anseljh
Copy link
Member

anseljh commented Apr 10, 2024

In that vein...how crazy would it be to run this on everything in the RECAP Archive that looks like a motion? It would be further interesting to compare those results pre- and post-LLM era.

We've already found all the citations across all RECAP documents, but I haven't analyzed the various statuses of those citations. It'd take a few weeks if we wanted to do it. What outcome are you hoping for?

I am wondering if there are hallucinated citations that nobody knows about yet.

@mlissner
Copy link
Member Author

Ah ha. Yeah, that could be quite interesting. if we did it, we'd probably learn which citations we don't have, which would be useful too (even in an ongoing way, that could be useful). I'm not sure we have anybody to do this work though, at least for the moment...

@cmaczo
Copy link

cmaczo commented Apr 11, 2024

@mlissner You can find my first draft wireframe for the search and results page here: https://www.figma.com/file/HlrsN7b5wuAQXKMsij1tw5/Free-Law-Project?type=design&node-id=1011%3A7509&mode=design&t=HOFbpGYg91GLfDuT-1

Let's discuss during our meeting tomorrow. I am sure I have misunderstood something. :)

@mlissner
Copy link
Member Author

Nice. A few things:

  1. I'm usually a big fan of pagination for performance reasons, but I don't think we need it here. We can just return whatever people send us.

  2. I think the lookup box should have a button for [Analyze Citations] or [Look up Citation], sort of like google's [Search] vs. [I'm feeling lucky]. The reason it's important to be able to still look up one citation is that that's what most people that know the tool expect from it, and they don't need the analysis page — they just want to go to a certain citation's URL directly (I'm feeling lucky).

    I'm surprised you added a [Clear] button. I usually hate such buttons because of the risk that I bump into it and lose everything. You like them?

  3. On the results page, we can provide more metadata about the case. Probably the important missing piece is the court name. Maybe the docket number too?

  4. I think you misunderstood how the 250 lookups per request works. It's not that one citation could match 250 items, it's that you're only allowed 250 in the text you upload. After that, we parse and normalize the citation, but we don't match it to CL's.

    So if you have 1,000 citations in your block of text, we'll look them all up, but the 251st and onwards won't have case data. I imagine they could be compressed somehow, so they don't all have to appear on the page individually. Something like, "We can only process 250 citations at a time. The remaining 220 citations in your upload were not processed."

  5. We know every reporter at this point (see the huge JSON object in the reporters DB), so if it's an unknown reporter, we can just say it's invalid (as opposed to "not in our system.") We're very, very good at identifying valid citations. Painfully so.

I think that's it for this round. Want to make a few revisions and I'll take another look?

Thank you!

@cmaczo
Copy link

cmaczo commented Apr 12, 2024

Updated: https://www.figma.com/file/HlrsN7b5wuAQXKMsij1tw5/Free-Law-Project?type=design&node-id=1011%3A7487&mode=design&t=5EvA6bLtxlG2869y-1

I'm usually a big fan of pagination for performance reasons, but I don't think we need it here. We can just return whatever people send us.

Ok.

I think the lookup box should have a button for [Analyze Citations] or [Look up Citation], sort of like google's [Search] vs. [I'm feeling lucky].

I've added the additional button, but I wonder if people will understand the difference between "Analyze citations" and "Look up citation." I'm not sure I would, just from the button titles.

I'm surprised you added a [Clear] button. I usually hate such buttons because of the risk that I bump into it and lose everything. You like them?

I don't like them, but they're usually considered best practice where someone could be pasting a huge chunk of text into a text area. For many people, it's hard to delete all that text once it's in there, so must of the UI guides say you should provide clear buttons for them.

On the results page, we can provide more metadata about the case. Probably the important missing piece is the court name. Maybe the docket number too?

Done.

So if you have 1,000 citations in your block of text, we'll look them all up, but the 251st and onwards won't have case data.

Updated with this new understanding.

I think what confused me was this part of your example API output:

{
   "citation": "576 U.S. 644",
   "normalized_citations": [
     "576 U.S. 644"
   ],
   "start_index": 10002,
   "end_index": 10013,
   "status": 429,
   "error_message": "Too many citations requested.",
   "clusters": []
 },

if it's an unknown reporter, we can just say it's invalid (as opposed to "not in our system.")

Done.

@mlissner
Copy link
Member Author

Looks good to me. I made a few more comments in Figma, but I think it should be a solid place to begin.

I wonder if people will understand the difference between "Analyze citations" and "Look up citation." I'm not sure I would, just from the button titles.

There's actually a better way to do this, I think. If there is one citation in the text, we take the user to it. If there's more than one, we provide the page you mocked up. Simple, and that means we only need one button. Perhaps it should just say, "Analyze"?

@anseljh
Copy link
Member

anseljh commented Apr 12, 2024 via email

@mlissner
Copy link
Member Author

If we do that, we break compatibility with people that use this URL as an API and who expect a redirect response when their case is found, a 404 when it's not, etc. But aside from that sort-of-requirement, if there's more than one citation in the text, I think we'd find that. When would we miss that and send you to a response when there was more than one citation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants