Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Project Gezond scraper (Dutch Website) #691

Merged
merged 3 commits into from Dec 6, 2022

Conversation

NijeboerFrank
Copy link
Contributor

Thanks for this project! I use it a lot on my Mealie instance.

This PR contains a Dutch recipe site that I use sometimes (Link). Sadly the site did not have a nice way of using the schema nor did the HTML have useful class names, so my implementation might seem a bit hacky.

Feedback is appreciated to improve the implementation!

@jayaddison
Copy link
Collaborator

Thanks @NijeboerFrank for your contribution! This implementation looks really good to me.

One nitpick I had was about the category field (the newline), and it seems like that's also something the unit tests are complaining about at the moment. Hopefully a quick fixup.

Sadly the site did not have a nice way of using the schema nor did the HTML have useful class names

That's ok - this library is here to help with situations like that :)

Feedback is appreciated to improve the implementation!

One more item I noticed: we could replace .get_text(...) calls with .text attribute accesses. Let me scour for any other improvements as well.

def yields(self):
# Match everything in the h2 with 'Dit heb je nodig'
# The text inside the parentheses contains the yield for the ingredients that are listed
return re.search(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing to keep in mind - hopefully not important in this case, but in general - and sorry if I'm explaining things that you understand already, but it's worth being careful to limit what regular expressions can match on, and/or how much input text they are provided as input.

Just something I repeat (no pun intended) at nearly every available opportunity 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice! I'm not really that familiar with regular expressions, so all tips are appreciated 😄

@jayaddison
Copy link
Collaborator

This looks good to me - thanks again @NijeboerFrank. I'll plan to merge and release this in the nearish future (possibly not today, but should be within the next few days otherwise).

@jayaddison jayaddison merged commit 607ab04 into hhursev:main Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants