New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixup: repair sallys-blog to match updated website design #744
Conversation
* Added support for sallys-blog.de * Added missing test for sallys-blog.de total time * Fixed formatting and string handling of sallys-blog.de scraper * Removed unused dependency from sallys-blog.de scraper (cherry picked from commit 516e7f4)
…ngraph image property is found on the page
@@ -47,6 +47,6 @@ def decorated_method_wrapper(self, *args, **kwargs): | |||
image = self.soup.find( | |||
"meta", {"property": "og:image", "content": True} | |||
) | |||
return image.get("content") | |||
return image.get("content") if image else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
return normalize_string(self.soup.head.find("title").get_text()) | ||
|
||
def image(self): | ||
raise NotImplementedError() # todo: probably better to return URLs than base64 content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I'm ok with something (kinda-ugly) like:
from urllib.parse import unquote
.....
def image(self):
image_element = self.soup.find("div", {"class": "images-wrap"}).findAll("img", {"sizes": "100vw"})[0]
image_src = image_element["src"]
image_url = unquote(image_src).split("url=")[1].split("&")[0]
return image_url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I hadn't noticed those image links. That sounds like a good approach.
Resolves #739.