Gousto.co.uk scraper broken #376

frazzyfin · 2021-04-30T11:16:45Z

Thanks for filing a bug report with us!

If your request is about a website that is not supported, please open a 'new scraper' issue request instead.

To help get the issue fixed, please fill in the information below.

Pre-filing checks

I have searched for open issues that report the same problem
I have checked that the bug affects the latest version of the library

The URL of the recipe(s) that are not being scraped correctly

https://www.gousto.co.uk/cookbook/chicken-recipes/chicken-stuffing-sarnie-with-plum-chutney

The version of Python you're using

Python 3.8.5

The operating system of your environment

Ubuntu

The results you expect to see

After running scraper = scrape_me() on the url, then scraper.title(), i'd expect to see the title of the recipe - Chicken & Stuffing Sarnie With Plum Chutney

The results (including any Python error messages) that you are seeing

>>> scraper.title()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fraser/dev/recipe-scrapers/recipe_scrapers/plugins/exception_handling.py", line 63, in decorated_method_wrapper
    return decorated(self, *args, **kwargs)
  File "/home/fraser/dev/recipe-scrapers/recipe_scrapers/plugins/html_tags_stripper.py", line 74, in decorated_method_wrapper
    decorated_func_result = decorated(self, *args, **kwargs)
  File "/home/fraser/dev/recipe-scrapers/recipe_scrapers/plugins/normalize_string.py", line 33, in decorated_method_wrapper
    return normalize_string(decorated(self, *args, **kwargs))
  File "/home/fraser/dev/recipe-scrapers/recipe_scrapers/plugins/schemaorg_fill.py", line 46, in decorated_method_wrapper
    return decorated(self, *args, **kwargs)
  File "/home/fraser/dev/recipe-scrapers/recipe_scrapers/gousto.py", line 11, in title
    return self.soup.find("h1", {"class": "indivrecipe-title"}).get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

Can you write Python and would you like to help fix the scraper yourself? We'd be glad for your assistance! We can provide you with guidance and code review in return. If so, tick any of the relevant boxes below:

I'd like to try fixing this scraper myself
I'd like guidance to help me develop a fix
I'd prefer if the recipe-scrapers team try to fix this

The text was updated successfully, but these errors were encountered:

* Update test HTML to live site * Fix title * Use schema for recipe & update test

Nelinski · 2021-12-06T20:46:06Z

Just checking this is still on the list to fix as I'm still having issues with Gousto? Thanks!

jayaddison · 2021-12-07T00:35:31Z

@Nelinski thanks for checking - could you confirm the version of recipe-scrapers you're using, and whether you're seeing the same exception (AttributeError: 'NoneType' object has no attribute 'get_text') or whether there's something else going on too?

Nelinski · 2021-12-07T12:55:28Z

I'm trying this via Mealie which uses this scraper. It looks like I'm getting:
AttributeError: 'NoneType' object has no attribute 'get'

Looks like Mealie is using version "13.7.0".

jayaddison · 2021-12-07T13:24:47Z

Ok, great! - any chance you could include an example URL or two? (that'd help replicate the error, and then we can track down the reason the get is failing)

Nelinski · 2021-12-07T15:51:34Z

Sure! Appreciate the help on this.

List of all the recipes - https://www.gousto.co.uk/cookbook/recipes?page=1
A couple of direct links:

jayaddison · 2021-12-07T22:09:50Z

Hmm, weird.. it looks like Gousto's site may no longer have schema.org JSON in the source; at least that's what I see when browsing one of the recipes myself.

Can anyone else confirm that too? (view source in your preferred browser is probably the easiest way; or by curl'ing or using Python to retrieve the source of one of those URLs)

Nelinski · 2021-12-07T22:22:41Z

Looks like they're doing it via JS now rather than directly in the source as I can't see it, but it looks to validate OK here:
https://validator.schema.org/#url=https%3A%2F%2Fwww.gousto.co.uk%2Fcookbook%2Fchicken-recipes%2Fchicken-date-tamarind-curry

Edit: When looking at the source via schema.org, relevant snippet below:

</body>
</html>
<!-- Inserted by https://www.gousto.co.uk/cookbook/static/js/5.d02f4471.chunk.js -->
<script type="application/ld+json">
  {
    "@context": "http://schema.org/",
    "@type": "Recipe",
    "name": "Chicken, Date & Tamarind Curry With Kachumber",

AdityaSoni19031997 · 2022-02-02T04:12:27Z

Just curious, can't we directly fetch for "application/ld+json" while scraping?

Nelinski · 2022-03-16T21:13:06Z

@PatrickPierce Looks like you created the original scraper for Gousto, any ideas on this one?

PatrickPierce · 2022-03-17T01:48:05Z

@PatrickPierce Looks like you created the original scraper for Gousto, any ideas on this one?

Unfortunately I do not. The original scraper has been redesigned to use schema over parsing the HTML. I can confirm that the issue still occurs with 13.20.0 and that schema validator detects the correct information.

There is an issue with the test, but I do not think that will make the parser fail.

        self.assertEqual(
            "https://test.example.com/", self.harvester_class.canonical_url()
        )

Test URL: https://www.gousto.co.uk/cookbook/pork-recipes/creamy-pork-tagliatelle

hhursev · 2022-03-17T18:13:59Z

The problem stems from gousto.co.uk having a "javascript detection" mechanism which make it so the html is not visible in it's entirety when fetched with simple requests.get() approach. I'll submit an ad-hoc solution this weekend and bump the version.

hhursev · 2022-03-19T14:51:12Z

As of version 13.22.0 gousto.co.uk should be supported again. lmk in case of any problems @Nelinski

frazzyfin added the bug label Apr 30, 2021

hhursev self-assigned this May 1, 2021

arbrennan mentioned this issue Jun 13, 2021

Fix Gousto (issue #376) by using schema #390

Merged

hhursev pushed a commit that referenced this issue Oct 10, 2021

Fix Gousto (issue #376) by using schema (#390)

fde8dce

* Update test HTML to live site * Fix title * Use schema for recipe & update test

hhursev added a commit that referenced this issue Mar 19, 2022

Update Gousto.co.uk scraper. Fixes #376

bf41b20

hhursev closed this as completed in e6fe2ae Mar 19, 2022

cadamswaite mentioned this issue Oct 1, 2023

[SCRAPER] - Importing Gousto recipes is missing image and total time. mealie-recipes/mealie#2547

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gousto.co.uk scraper broken #376

Gousto.co.uk scraper broken #376

frazzyfin commented Apr 30, 2021

Nelinski commented Dec 6, 2021

jayaddison commented Dec 7, 2021

Nelinski commented Dec 7, 2021

jayaddison commented Dec 7, 2021

Nelinski commented Dec 7, 2021

jayaddison commented Dec 7, 2021

Nelinski commented Dec 7, 2021 •

edited

AdityaSoni19031997 commented Feb 2, 2022 •

edited

Nelinski commented Mar 16, 2022

PatrickPierce commented Mar 17, 2022

hhursev commented Mar 17, 2022

hhursev commented Mar 19, 2022

Gousto.co.uk scraper broken #376

Gousto.co.uk scraper broken #376

Comments

frazzyfin commented Apr 30, 2021

Nelinski commented Dec 6, 2021

jayaddison commented Dec 7, 2021

Nelinski commented Dec 7, 2021

jayaddison commented Dec 7, 2021

Nelinski commented Dec 7, 2021

jayaddison commented Dec 7, 2021

Nelinski commented Dec 7, 2021 • edited

AdityaSoni19031997 commented Feb 2, 2022 • edited

Nelinski commented Mar 16, 2022

PatrickPierce commented Mar 17, 2022

hhursev commented Mar 17, 2022

hhursev commented Mar 19, 2022

Nelinski commented Dec 7, 2021 •

edited

AdityaSoni19031997 commented Feb 2, 2022 •

edited