Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when namespaces contain the key "self" #216

Closed
Leyard opened this issue Mar 19, 2021 · 2 comments · Fixed by #217
Closed

TypeError when namespaces contain the key "self" #216

Leyard opened this issue Mar 19, 2021 · 2 comments · Fixed by #217
Labels
S: triage Issue needs triage.

Comments

@Leyard
Copy link

Leyard commented Mar 19, 2021

I was using bs4/soupsieve to parse some xml files from SEC websites. Here is my MWE

import requests
from bs4 import BeautifulSoup

url = "https://www.sec.gov/Archives/edgar/data/1031235/000156459017010923/self-20170331.xml"
r = requests.get(url)

soup = BeautifulSoup(r.content, "xml")
print(soup.select("identifier"))

It worked smoothly until I got a weird TypeError when I used the CSS selector

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-12dc7c20e133> in <module>
----> 1 soup.select("identifier")

~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in select(self, selector, namespaces, limit, **kwargs)
   1867             )
   1868
-> 1869         results = soupsieve.select(selector, self, namespaces, limit, **kwargs)
   1870
   1871         # We do this because it's more consistent and because

~/opt/anaconda3/lib/python3.8/site-packages/soupsieve/__init__.py in select(select, tag, namespaces, limit, flags, **kwargs)
     96     """Select the specified tags."""
     97
---> 98     return compile(select, namespaces, flags, **kwargs).select(tag, limit)
     99
    100

~/opt/anaconda3/lib/python3.8/site-packages/soupsieve/__init__.py in compile(pattern, namespaces, flags, **kwargs)
     45
     46     if namespaces is not None:
---> 47         namespaces = ct.Namespaces(**namespaces)
     48
     49     custom = kwargs.get('custom')

TypeError: __init__() got multiple values for argument 'self'

It turned out this specific xml file contains namespaces with the key "self", which caused the TypeError when you unpacked the namespaces as keyword parameters in line 47 of soupsieve/__init__.py

In [11]: soup._namespaces
Out[11]:
{'xml': 'http://www.w3.org/XML/1998/namespace',
 'utr': 'http://www.xbrl.org/2009/utr',
 'iso4217': 'http://www.xbrl.org/2003/iso4217',
 'self': 'http://globalselfstorageinc.com/20170331',
 'xbrll': 'http://www.xbrl.org/2003/linkbase',
 'xlink': 'http://www.w3.org/1999/xlink',
 'nonnum': 'http://www.xbrl.org/dtr/type/non-numeric',
 'num': 'http://www.xbrl.org/dtr/type/numeric',
 'xbrldt': 'http://xbrl.org/2005/xbrldt',
 'us-types': 'http://fasb.org/us-types/2016-01-31',
 'us-gaap': 'http://fasb.org/us-gaap/2016-01-31',
 'dei': 'http://xbrl.sec.gov/dei/2014-01-31',
 'country': 'http://xbrl.sec.gov/country/2016-01-31',
 'currency': 'http://xbrl.sec.gov/currency/2016-01-31',
 'exch': 'http://xbrl.sec.gov/exch/2016-01-31',
 'invest': 'http://xbrl.sec.gov/invest/2013-01-31',
 'stpr': 'http://xbrl.sec.gov/stpr/2011-01-31',
 'sic': 'http://xbrl.sec.gov/sic/2011-01-31',
 'naics': 'http://xbrl.sec.gov/naics/2011-01-31',
 'xbrldi': 'http://xbrl.org/2006/xbrldi',
 'xsi': 'http://www.w3.org/2001/XMLSchema-instance'}

Not sure if this counts as a bug of soupsieve, or should I handle this issue on my side. Feel free to suggest a solution for me.

@facelessuser
Copy link
Owner

Yup, this is a bug. I have a fix in #217. I'm not quite sure why I was using kwargs for this. I absolutely don't need it, especially if it can conflict with self.

After the fix your example script runs fine:

$soupsieve git:(master) ✗ python3 bug.py
[<identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>, <identifier scheme="http://www.sec.gov/CIK">0001031235</identifier>]

@facelessuser
Copy link
Owner

Thanks for the bug report! I've tagged a new release 2.2.1. It should be available shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: triage Issue needs triage.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants