Update Gousto.co.uk scraper. Closes #376 #511

hhursev · 2022-03-19T11:37:59Z

No description provided.

coveralls · 2022-03-19T11:41:13Z

Coverage increased (+0.06%) to 95.36% when pulling 7238bbd on gousto-scraper-fix into 2d1175a on main.

jayaddison · 2022-10-27T12:03:53Z

recipe_scrapers/goustojson.py

+class GoustoJson(AbstractScraper):
+    """
+    Ad-hoc solution to https://github.com/hhursev/recipe-scrapers/issues/376
+    Let's see if it stands the test of time and reevaluate.


Possibly time to discuss some of that re-evaluation?

It'd be nice to imagine a future where recipe webpages include relevant metadata on-page for user-agents to collect easily.

What's less clear to me is whether we can assume that that's happening already as a trend (allowing us to de-prioritize multi-request scrapers) or whether sites might become more evasive over time (meaning that we could help navigate towards the goal by permitting multi-request scrapers).

...with developer, dependency and ecosystem implications in each case.

At this point in time I'm leaning towards Offline Use Only note in README.md and on scrape_me invocation.

Also, updating scrape_html to auto-pick the scraper based on the <link rel="canonical" .. > in the html.

In short, de-prioritizing multi-request scrapers.

Our package would aim to be good once it gets it's hands on a valid html from the sites listed. Not playing it clever bypassing browser/javascript checks or requirements.

Ok, yep - reducing the interface to scrape_html only would seem to have a bunch of benefits.

It means we could require/enforce decoded str input, and we'd have fewer dependencies and modules related to networking.

The main drawback I have in mind is that rel="canonical" isn't found on all pages - so perhaps an optional (warn-if-different-to-canonical-URL-in-HTML) parameter to provide that when it isn't available?

Update Gousto.co.uk scraper. Fixes #376

bf41b20

Add image test and exclude some lines from coverage reports

7238bbd

hhursev merged commit e6fe2ae into main Mar 19, 2022

hhursev deleted the gousto-scraper-fix branch March 19, 2022 16:19

jayaddison reviewed Oct 27, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Gousto.co.uk scraper. Closes #376 #511

Update Gousto.co.uk scraper. Closes #376 #511

hhursev commented Mar 19, 2022

coveralls commented Mar 19, 2022 •

edited

jayaddison Oct 27, 2022

hhursev Oct 28, 2022

jayaddison Oct 28, 2022

Update Gousto.co.uk scraper. Closes #376 #511

Update Gousto.co.uk scraper. Closes #376 #511

Conversation

hhursev commented Mar 19, 2022

coveralls commented Mar 19, 2022 • edited

jayaddison Oct 27, 2022

Choose a reason for hiding this comment

hhursev Oct 28, 2022

Choose a reason for hiding this comment

jayaddison Oct 28, 2022

Choose a reason for hiding this comment

coveralls commented Mar 19, 2022 •

edited