-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/get-updated-legistar-matter #94
Conversation
/test-scraper get_king_county_events --from_dt=2022-04-04T00:00:00 --to_dt=2022-04-09T00:00:00 |
Begin |
|
Just to prove nothing has broken. This PR yields no changes to scraped data. Have run and passed |
Before more work is committed, wanted to see if something like So just |
https://github.com/CouncilDataProject/cdp-scrapers/actions/runs/2213727492 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so I think I get what is going on and in general I like it.
A couple of comments / questions:
- I saw a bit of the code that said like
get_matter
vsfind_matter
-- just to fully understand, the get matter takes and id where find matter takes the name? - There is a lot of fancy caching that is happening that could probably be abstracted away by using
lru_cache
see my use of it incdp-data
here. - we should hook up seattle-staging to this repo so that everytime a PR runs on this repo, it runs the pipeline for seattle. I am thinking of a ci job that wipes seattle staging DB and FS, then runs the pipeline with this scraper, then comments with a link(s) to the stored event. Would make it easier to "trust" this. (not that I don't trust you hahahah)
if isinstance(model, Matter): | ||
return self.get_updated_matter(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I assume we are going to have a lot of these, it may be better to have a LUT.
MODEL_UPDATER_LUT = {
Matter: self.get_updated_matter,
Person: self.get_updated_person,
}
return MODEL_UPDATER_LUT[type(model)]
LEGISTAR_MATTER_SPONSORS = "MatterSponsorInfo" | ||
LEGISTAR_SPONSOR_PERSON = "SponsorPersonInfo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. So before, we never queried the full Legistar Matter. We just used bits of Legistar EventItem like EventItemMatterName to create ingestion model Matter
.
Now that there is need to query for a Legistar Matter, I implemented get_legistar_matter()
so I can reuse it from both the main get_legistar_events_for_timespan()
and this new get_updated_model()
...
So, before in get_legistar_events_for_timespan()
legistar_event_item = {
...
"SponsorPersonInfo" : [
{
"PersonFullName" : ...
},
...
],
...
}
Now,
legistar_event_item = {
...
"EventItemMatterInfo" : {
"MatterFile" : ...,
"MatterSponsorInfo" : [
{
"PersonFullName" : ...
},
...
],
...
},
...
}
Other general comment is that I see you are using the ingestion model as the "update" model but since this is the "update API" it may be better to update from the database models and not the ingestion models. |
@@ -99,6 +99,8 @@ | |||
|
|||
LEGISTAR_EV_ITEMS = "EventItems" | |||
LEGISTAR_EV_ATTACHMENTS = "EventItemMatterAttachments" | |||
LEGISTAR_EV_MATTER_ID = "EventItemMatterId" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarification, what's the difference between LEGISTAR_EV_MATTER_ID
and LEGISTAR_MATTER_EXT_ID
? I'm not too familiar with Legistar's schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the same Legistar Matter, the value will be the same.
LEGISTAR_EV_MATTER_ID
= "EventItemMatterId" is a field in the Legistar Event Item. So, after querying for a Legistar Event Item, we can then query for the associated Matter using that field.
LEGISTAR_MATTER_EXT_ID
= "MatterId" is a field in the Legistar Matter. After obtaining a Legistar Matter, we use that field to set Matter.external_source_id
.
matter: Dict[str, Any] = resp.json() | ||
|
||
# Person JSON for this matter's sponsors | ||
matter[LEGISTAR_MATTER_SPONSORS] = reduced_list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was some built-in python method at first and was kinda confused lol. I think this list comprehension is a little awkward to read but I'm fine with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I will change this. Readability is important for us.
I'm personally a little more partial to something like |
Correct. And I don't expect the find-by-name code path to be used that much, but I wrote it just in case we don't have the Legistar MatterId for the requested Matter.
Thank you!
No no, definitely do not trust me LOL 🤣 |
Ah I like this. |
I am pretty busy but that may be a good project for someone who wants to learn devops stuff. It also ties in well with: CouncilDataProject/cdp-backend#166 Further, I just think we could use it for all scrapers. So we have test scraper command but we should have a command that is like test-scraper --full or something?? similarly, if we can hook it up to the cookiecutter bot so that whatever scraper we generate, we can preview the deployment on staging before we fully deploy it 👀 |
Those functions are auto generated. If all CDP instances were backed by legistar instances then that proposal would work but since not all are we shouldn't even try to guarentee that the API will be consistent across all of them. Hence why we generate those functions. If that doesn't make sense happy to expand more. I am sort of thinking that make sense but I could have talked myself into a circle. |
I am finally getting around to checking on other PRs. I don't think I need to do anything on this one right? No update yet? (no pressure I am just behind and lost track of stuff) |
Link to Relevant Issue
This pull request is a part of #59 .
Description of Changes
LegistarScraper.get_updated_model(matter)
->LegistarScraper.get_updated_model(matter)
where Legistar API is queried for the givenmatter
.Eventually, will provide API similar to
get_{municipality}_events()
, e.g.