Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Support for custom Python environment that ignore PEP 3120 #114

Closed
kivhub opened this issue Sep 22, 2021 · 6 comments
Closed

[BUG] Support for custom Python environment that ignore PEP 3120 #114

kivhub opened this issue Sep 22, 2021 · 6 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@kivhub
Copy link

kivhub commented Sep 22, 2021

Describe the bug
With requests library using charset-normalizer I am getting an error when calling Python via User-Defined Transform in SAP BODS:

File "EXPRESSION", line 6, in <module>
File "c:\program files\python39\lib\site-packages\requests\__init__.py", line 48, in <module>
from charset_normalizer import __version__ as charset_normalizer_version
File "c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py", line 11
SyntaxError: Non-ASCII character '\xd1' in file c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py on
line 12, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details.

I am not able to define a source code encoding by placing a magic comment into the source files (either as a first or second line in the file) because the app probably modifies the script by itself (placing # -*- coding: utf-8 -*- doesn't help). The setting of environment variable PYTHONUTF8=1 doesn't help too.

To Reproduce
I am not able to provide code to reproduce the issue, it arises when calling Python via User-Defined Transform in SAP BODS
Please check: apache/superset#15631
This could be the same problem: https://stackoverflow.com/questions/68594538/syntaxerror-non-ascii-character-xd1-in-file-charset-normalizer-init-py-i

Expected behavior
No error - with requests version using chardet library there is no problem. Maybe avoiding non-ASCII characters in init.py could help...?

Logs
Please see the bug description.

Desktop (please complete the following information):

  • OS: Windows 2016 Server
  • Python version 3.9.6
  • Package version 2.0.6
  • Requests version 2.26.0

Additional context
N/A

@kivhub kivhub added bug Something isn't working help wanted Extra attention is needed labels Sep 22, 2021
@Ousret
Copy link
Owner

Ousret commented Sep 23, 2021

Hi,

Thanks for the detailed report.
Yes, for some reasons, some environments does not take UTF-8 as the default source encoding.

The PEP 3120 gets ignored https://www.python.org/dev/peps/pep-3120/

Its not just the top level __init__.py that have non ASCII characters. Also in assets/__init__.py.
Since, by your tests it seems to also ignore PEP 263 https://www.python.org/dev/peps/pep-0263/ I do not have any silver bullet for this one.

The obvious solution would be to find a proper way to represent u8 characters without actually using the str repr.
I am going to think more about this.

Open for suggestions/PR.

@Ousret Ousret changed the title [BUG] Non-ASCII character '\xd1' in file ...\lib\site-packages\charset_normalizer_init_.py [BUG] Support for custom Python environment that ignore PEP 3120 Sep 23, 2021
@Ousret
Copy link
Owner

Ousret commented Sep 23, 2021

There is something of interest cf. https://bugs.python.org/issue29240

@Ousret
Copy link
Owner

Ousret commented Sep 24, 2021

@kivhub I have found something interesting regarding the NT platform + python.

On Windows, the PYTHONLEGACYWINDOWSFSENCODING environment variable (PEP 529) has the priority over UTF-8 Mode.

https://www.python.org/dev/peps/pep-0540/

What does return the following for you:

sys.getfilesystemencoding()
locale.getpreferredencoding()

And could you test against dev-master to see if that patch does anything at all #116

@kivhub
Copy link
Author

kivhub commented Sep 27, 2021

@kivhub I have found something interesting regarding the NT platform + python.

On Windows, the PYTHONLEGACYWINDOWSFSENCODING environment variable (PEP 529) has the priority over UTF-8 Mode.

https://www.python.org/dev/peps/pep-0540/

Thank you very much, @Ousret, for your effort. It looks there is a problem with old version of Python which comes to play with SAP BODS.

What does return the following for you:

sys.getfilesystemencoding()
locale.getpreferredencoding()

Python (3.9.6) itself returns UTF-8 on both. But when running Python via SAP BODS job the results are as follows:

('sys.getfilesystemencoding()', 'mbcs')
('locale.getpreferredencoding()', 'cp1252')
('sys.version_info.major', 2)
('sys.version_info.minor', 7)

The Python 3 libraries are used via:

#this works (requests + chardet)
sys.path.insert(0, 'c:\program files\python37\lib\site-packages')

#this doesn't work (requests + charset-normalizer)
sys.path.insert(0, 'c:\program files\python39\lib\site-packages')

And could you test against dev-master to see if that patch does anything at all #116

I replaced the contents of these 2 files and after change to:

sys.path.insert(0, 'c:\program files\python39\lib\site-packages')

I am getting this error message:

File "EXPRESSION", line 7, in <module>
File "c:\program files\python39\lib\site-packages\requests\__init__.py", line 48, in <module>
from charset_normalizer import __version__ as charset_normalizer_version
File "c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py", line 20, in <module>
from .api import from_bytes, from_fp, from_path, normalize
File "c:\program files\python39\lib\site-packages\charset_normalizer\api.py", line 38
sequences: bytes,
^
SyntaxError: invalid syntax.

@kivhub
Copy link
Author

kivhub commented Sep 27, 2021

And could you test against dev-master to see if that patch does anything at all #116

I replaced the contents of these 2 files and after change to:

sys.path.insert(0, 'c:\program files\python39\lib\site-packages')

I am getting this error message:

File "EXPRESSION", line 7, in <module>
File "c:\program files\python39\lib\site-packages\requests\__init__.py", line 48, in <module>
from charset_normalizer import __version__ as charset_normalizer_version
File "c:\program files\python39\lib\site-packages\charset_normalizer\__init__.py", line 20, in <module>
from .api import from_bytes, from_fp, from_path, normalize
File "c:\program files\python39\lib\site-packages\charset_normalizer\api.py", line 38
sequences: bytes,
^
SyntaxError: invalid syntax.

I have an update here. After moving path.insert before all imports:

sys.path.insert(0, 'c:\program files\python39\lib\site-packages')
import json
import sys
import os
import requests
from datetime import datetime

the code runs fine without any errors.

@Ousret
Copy link
Owner

Ousret commented Sep 27, 2021

I have not been able to force Python to trigger the decode error using a NT platform..

('sys.getfilesystemencoding()', 'mbcs')
('locale.getpreferredencoding()', 'cp1252')
('sys.version_info.major', 2)  # ??
('sys.version_info.minor', 7)

Well, from the look of it, your setup invoke Python 2.7 and from that standpoint I cannot do anything.

SyntaxError: invalid syntax. is a pretty good indicator.

I would recommend you to alter the PATH outside of Python.
So I am closing this as there is nothing that can be done from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants