Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError while parsing feed #277

Closed
samuelclay opened this issue Apr 29, 2021 · 2 comments
Closed

UnicodeDecodeError while parsing feed #277

samuelclay opened this issue Apr 29, 2021 · 2 comments

Comments

@samuelclay
Copy link

Here's a feed that throws a UnicodeDecodeError (similar to #273 but decoding): http://feed.informer.com/digests/XDOCBDJCK3/feeder.atom. Now it doesn't validate but it should probably still be handled with a bozo exception.

>>> import feedparser
>>> feedparser.__version__
'6.0.2'
>>> feedparser.parse('http://feed.informer.com/digests/XDOCBDJCK3/feeder.atom')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/feedparser/api.py", line 255, in parse
    saxparser.parse(source)
  File "/usr/local/lib/python3.9/xml/sax/expatreader.py", line 111, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/lib/python3.9/xml/sax/xmlreader.py", line 125, in parse
    self.feed(buffer)
  File "/usr/local/lib/python3.9/xml/sax/expatreader.py", line 217, in feed
    self._parser.Parse(data, isFinal)
  File "/usr/src/python/Modules/pyexpat.c", line 461, in EndElement
  File "/usr/local/lib/python3.9/xml/sax/expatreader.py", line 381, in end_element_ns
    self._cont_handler.endElementNS(pair, None)
  File "/usr/local/lib/python3.9/site-packages/feedparser/parsers/strict.py", line 124, in endElementNS
    self.unknown_endtag(localname)
  File "/usr/local/lib/python3.9/site-packages/feedparser/mixin.py", line 320, in unknown_endtag
    method()
  File "/usr/local/lib/python3.9/site-packages/feedparser/namespaces/mediarss.py", line 58, in _end_media_title
    self._end_title()
  File "/usr/local/lib/python3.9/site-packages/feedparser/namespaces/_base.py", line 384, in _end_title
    value = self.pop_content('title')
  File "/usr/local/lib/python3.9/site-packages/feedparser/mixin.py", line 630, in pop_content
    value = self.pop(tag)
  File "/usr/local/lib/python3.9/site-packages/feedparser/mixin.py", line 508, in pop
    output = base64.decodebytes(output.encode('utf8')).decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 1: invalid continuation byte
@kurtmckee
Copy link
Owner

kurtmckee commented Jun 13, 2021

Thanks Samuel, I'll work to get this fixed and released as a hotfix.

kurtmckee added a commit that referenced this issue Jun 13, 2021
@kurtmckee
Copy link
Owner

kurtmckee commented Jun 14, 2021

This is fixed in feedparser 6.0.4. Thanks for reporting this, Samuel!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants