Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cl_scrape_opinions: Some scraped fields are not used when creating objects #4042

Open
grossir opened this issue May 10, 2024 · 1 comment
Open

Comments

@grossir
Copy link
Contributor

grossir commented May 10, 2024

The following returned attributes from juriscraper scrapers are not used on courtlistener:

These require easy changes (1-2 lines) to be used in the models:

  • lower_court goes to Docket.appeal_from_str
  • disposition goes to OpinionCluster.disposition
  • cause goes to Docket.cause, but I don't know if it is a PACER reserved field

These require more work:

  • lower_court_numbers and lower_court_judges goes to OriginatingCourtInformation...

For these, I find no model on CL, nor a direct reference when using string search

  • docket_attachment_numbers
  • docket_document_numbers
  • adversary_numbers
  • divisions

If not used anywhere, we should probably delete them, since they are introducing noise


Code on courtlistener that uses the scraped attributes to build objects:
https://github.com/freelawproject/courtlistener/blob/c8f712754ff7041235df617e6351accf4b6b3754/cl/scrapers/management/commands/cl_scrape_opinions.py#L78C1-L137C6

Code on juriscraper that defines the attributes is on OpinionSite and OpinionSiteLinear
https://github.com/grossir/juriscraper/blob/92d27210adebfe7efa3b5ff2777667d3cd0de78f/juriscraper/OpinionSite.py#L18-L43

grossir added a commit to grossir/courtlistener that referenced this issue May 13, 2024
Partially solves freelawproject#4042

- Ingest "lower_courts" into `Docket.appeal_from_str`
- Ingest "dispositions" into `OpinionCluster.disposition`
- Ingest "authors" into `Opinion.author_str`
- Ingest "joined_by" into `Opinion.joined_by`
- Ingest "per_curiam" into `Opinion.per_curiam`
- Ingest "types" into `Opinion.type`

Last 4 fields are not supported in Juriscraper as of yet, but the changes proposed keep the default behavior
@grossir
Copy link
Contributor Author

grossir commented May 13, 2024

This is a good opportunity to support some extra fields both on Courtlistener and Juriscraper
I propose to add these:

  • Ingest "authors" into Opinion.author_str
  • Ingest "joined_by" into Opinion.joined_by
  • Ingest "per_curiam" into Opinion.per_curiam
  • Ingest "types" into Opinion.type

grossir added a commit to grossir/courtlistener that referenced this issue May 13, 2024
Partially solves freelawproject#4042

- Ingest "lower_courts" into `Docket.appeal_from_str`
- Ingest "dispositions" into `OpinionCluster.disposition`
- Ingest "authors" into `Opinion.author_str`
- Ingest "joined_by" into `Opinion.joined_by`
- Ingest "per_curiam" into `Opinion.per_curiam`
- Ingest "types" into `Opinion.type`

Last 4 fields are not supported in Juriscraper as of yet, but the changes proposed keep the default behavior
grossir added a commit to grossir/courtlistener that referenced this issue May 15, 2024
Partially solves freelawproject#4042

Ingest "lower_courts" into Docket.appeal_from_str
Ingest "dispositions" into OpinionCluster.disposition
Ingest "authors" into Opinion.author_str
Ingest "joined_by" into Opinion.joined_by
Ingest "per_curiam" into Opinion.per_curiam
Ingest "types" into Opinion.type

Last 4 fields are not supported in Juriscraper as of yet, but the changes proposed keep the default behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant