Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDAL-level SystemError for some users #1167

Open
phargogh opened this issue Feb 4, 2023 · 5 comments · May be fixed by #1268
Open

GDAL-level SystemError for some users #1167

phargogh opened this issue Feb 4, 2023 · 5 comments · May be fixed by #1268
Assignees
Labels
bug Something isn't working on hold There's a reason we're not working on this yet

Comments

@phargogh
Copy link
Member

phargogh commented Feb 4, 2023

We have had a couple of issues on the forums now involving some kind of low-level exception in GDAL where we have almost no extra information. Examples include:

The issue appears to be happening at various times, including while reading a raster and while registering drivers.

@phargogh phargogh added bug Something isn't working in progress This issue is actively being worked on labels Feb 4, 2023
@phargogh phargogh self-assigned this Feb 4, 2023
phargogh added a commit to phargogh/invest that referenced this issue Feb 4, 2023
I'm hoping that this will allow us to capture more information about
what's going on.  RE:natcap#1167
@phargogh
Copy link
Member Author

phargogh commented Feb 4, 2023

I wrote back to the latest thread with a development build that uses GDAL's logging handler instead of the one that we've been using. A big apparent advantage of GDAL's handler is that they've (optionally) enabled their own internal debug logging to be exposed through the python logging, just at the logging.DEBUG level.

phargogh added a commit to phargogh/invest that referenced this issue Mar 20, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 20, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 22, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 22, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 22, 2023
Also adding CPL_DEBUG (copied from GDAL logging example) to increase
verbosity of the gdal logging, and mirroring the log levels that GDAL
has. RE:natcap#1167
phargogh added a commit to phargogh/invest that referenced this issue Mar 23, 2023
The user is re-running the model on the same workspace and while the
error message is consistent, I'm not getting it in the same point of
execution in the model.

RE:natcap#1167
phargogh added a commit to phargogh/invest that referenced this issue Mar 23, 2023
There is a possibility that the issue here is UTF-16 strings being
provided to GDAL.
RE:natcap#1167
phargogh added a commit to phargogh/invest that referenced this issue Mar 23, 2023
@phargogh
Copy link
Member Author

Observation: When converting a UTF-16-encoded bytestring to str, you get a similar error.

>>> len('foo'.encode('utf-16').decode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

The forums posts linked have a variety of different specific exceptions. Maybe this is related to one or more of them?

dba6ab4 tests this suspicion by examining the file encoding of any files provided by the user in an args dict.

phargogh added a commit to phargogh/invest that referenced this issue Mar 27, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 27, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 27, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 28, 2023
Now we don't return if there are an invalid number of parameters, just
log what we know and put in an obvious placeholder on what we don't
know. RE:natcap#1167
phargogh added a commit to phargogh/invest that referenced this issue Mar 29, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 29, 2023
It turns out this can be configured via an environment variable.
RE:natcap#1167
phargogh added a commit to phargogh/invest that referenced this issue Mar 29, 2023
@phargogh
Copy link
Member Author

After some digging into the python documentation and some trial-and-error, I believe this exception is being caused by how python is handling the user's default locale and filesystem encoding. The solution (PR coming soon) is to force python to use UTF-8 strings. Since GDAL expects UTF-8 filepaths, this is the most natural solution and should guarantee that we are working with UTF-8 paths at all times.

A few further notes from discovery and repair, for future us:

  • The user's workspace was in their home directory
  • The user's home directory contained the character ö.
  • The exception reported (UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xf6 in position 60) matches the byte value for ö in the Latin-1 encoding, so we can assume that the user's locale or filesystem encoding is latin-1.
  • Python was reading the user's filesystem encoding (on a Windows computer) as utf-8, so we can conclude that the user's locale is something other than UTF-8, C or POSIX (unlikely because it's a windows computer). The user's locale must therefore be causing some low-level files to be encoded as latin-1, which then causes errors in GDAL which expects UTF-8.
  • Forcing the python application to use UTF-8 Mode resolved the issue on the user's computer.

For anyone experiencing this issue on InVEST < 3.13.1 (I'm guessing that this will be the next release), setting the environment variable PYTHONUTF8=1 for all users and then relaunching the workbench should resolve the issue.

This environment variable will be necessary for at least the next 4 years, until Python 3.15 makes UTF-8 mode the default.

phargogh added a commit to phargogh/invest that referenced this issue Mar 29, 2023
@phargogh phargogh linked a pull request Mar 29, 2023 that will close this issue
3 tasks
phargogh added a commit to phargogh/invest that referenced this issue Mar 30, 2023
Refactor of how we create the arguments for spawning the InVEST process for clarity and brevity. RE:natcap#1167

Co-authored-by: Emily Soth <43770515+emlys@users.noreply.github.com>
phargogh added a commit to phargogh/invest that referenced this issue Mar 30, 2023
This will fix the tests hanging.

RE:natcap#1167
phargogh added a commit to phargogh/invest that referenced this issue Mar 30, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 30, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 30, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 30, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 30, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 30, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 30, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 31, 2023
phargogh added a commit to phargogh/invest that referenced this issue Mar 31, 2023
phargogh added a commit to phargogh/invest that referenced this issue Apr 6, 2023
@phargogh phargogh added this to the 3.13.1 milestone Apr 7, 2023
phargogh added a commit to phargogh/invest that referenced this issue Apr 11, 2023
phargogh added a commit to phargogh/invest that referenced this issue Apr 13, 2023
phargogh added a commit to phargogh/invest that referenced this issue Apr 15, 2023
@phargogh
Copy link
Member Author

phargogh commented Apr 15, 2023

OK, I don't have a solution, exactly, but I have been able to reproduce this on a Windows VM using Windows 10 insider edition, from the UK and German versions of Windows. The errors are the same in both, but the UK one is much easier for me to navigate :)

Here's what I've gathered so far:

  • In both cases, the python-detected default encoding is cp1252, which is normal for Windows outside the US. The migration to UTF-8 has been slow.
  • The root cause of the exception is a UnicodeDecodeError that is functionally equivalent to:
    >>> 'ötzi'.encode('cp1252').decode('utf-8')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 0: invalid start byte
    
  • Because the actual reported exception is a SystemError, we can infer that the true exception is being called from the python C API.
  • Because this is being reported in our GDAL logging handler, I am inferring that the GDAL SWIG wrapper is mishandling a string.

Furthermore:

  • These exceptions only happen when the application has been installed on Windows for this one user. Installing for all users (which puts the application in C:\Program Files) makes the exception go away.
  • When InVEST is installed for all users but the workspace is within the user's home directory, the errors are absent. This makes me think that InVEST's paths are fine, and that there's something happening within GDAL causing this error.

Workaround

Install InVEST for all users at install time.

phargogh added a commit to phargogh/invest that referenced this issue Apr 21, 2023
phargogh added a commit to phargogh/invest that referenced this issue Apr 21, 2023
Mamba isn't able to resolve our builds with python 3.11.
RE:natcap#1167
phargogh added a commit to phargogh/invest that referenced this issue Apr 21, 2023
phargogh added a commit to phargogh/invest that referenced this issue Apr 21, 2023
@phargogh phargogh modified the milestones: 3.14.0, 3.15.0 Aug 24, 2023
@dcdenu4 dcdenu4 added on hold There's a reason we're not working on this yet and removed in progress This issue is actively being worked on labels Oct 26, 2023
@phargogh
Copy link
Member Author

This is on hold because I haven't had a chance to dig into it for a while, and is mitigated by being sure to install InVEST for all users on the computer.

@dcdenu4 dcdenu4 removed this from the 3.15.0 milestone Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working on hold There's a reason we're not working on this yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants