-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gousto.co.uk scraper broken #376
Comments
* Update test HTML to live site * Fix title * Use schema for recipe & update test
Just checking this is still on the list to fix as I'm still having issues with Gousto? Thanks! |
@Nelinski thanks for checking - could you confirm the version of |
I'm trying this via Mealie which uses this scraper. It looks like I'm getting: Looks like Mealie is using version "13.7.0". |
Ok, great! - any chance you could include an example URL or two? (that'd help replicate the error, and then we can track down the reason the |
Sure! Appreciate the help on this. List of all the recipes - https://www.gousto.co.uk/cookbook/recipes?page=1 |
Hmm, weird.. it looks like Gousto's site may no longer have Can anyone else confirm that too? (view source in your preferred browser is probably the easiest way; or by curl'ing or using Python to retrieve the source of one of those URLs) |
Looks like they're doing it via JS now rather than directly in the source as I can't see it, but it looks to validate OK here: Edit: When looking at the source via schema.org, relevant snippet below:
|
Just curious, can't we directly fetch for "application/ld+json" while scraping? |
@PatrickPierce Looks like you created the original scraper for Gousto, any ideas on this one? |
Unfortunately I do not. The original scraper has been redesigned to use schema over parsing the HTML. I can confirm that the issue still occurs with 13.20.0 and that schema validator detects the correct information. There is an issue with the test, but I do not think that will make the parser fail.
Test URL: https://www.gousto.co.uk/cookbook/pork-recipes/creamy-pork-tagliatelle |
The problem stems from gousto.co.uk having a "javascript detection" mechanism which make it so the html is not visible in it's entirety when fetched with simple requests.get() approach. I'll submit an ad-hoc solution this weekend and bump the version. |
As of version |
Thanks for filing a bug report with us!
If your request is about a website that is not supported, please open a 'new scraper' issue request instead.
To help get the issue fixed, please fill in the information below.
Pre-filing checks
The URL of the recipe(s) that are not being scraped correctly
The version of Python you're using
Python 3.8.5
The operating system of your environment
Ubuntu
The results you expect to see
After running
scraper = scrape_me()
on the url, thenscraper.title()
, i'd expect to see the title of the recipe - Chicken & Stuffing Sarnie With Plum ChutneyThe results (including any Python error messages) that you are seeing
Can you write Python and would you like to help fix the scraper yourself? We'd be glad for your assistance! We can provide you with guidance and code review in return. If so, tick any of the relevant boxes below:
recipe-scrapers
team try to fix thisThe text was updated successfully, but these errors were encountered: