Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[py3] Fix string_ext unpacking #22

Merged
merged 1 commit into from Sep 23, 2019
Merged

[py3] Fix string_ext unpacking #22

merged 1 commit into from Sep 23, 2019

Conversation

jdost
Copy link
Contributor

@jdost jdost commented Sep 20, 2019

Background

The behavior of iterating over a bytestring differs between python2 and python3:

Python 3.7.4 (default, Jul 16 2019, 07:12:58) 
>>> f = b'asdf1234'
>>> [x for x in f]
[97, 115, 100, 102, 49, 50, 51, 52]
Python 2.7.16 (default, Mar 11 2019, 18:59:25) 
>>> f = b'asdf1234'
>>> [x for x in f]
['a', 's', 'd', 'f', '1', '2', '3', '4']

Problem

Reproduced the experienced failure around unpacking the string_ext field type. The failure is due to the above when translating the "packed" bytestring into a list of ints (based on the spec). The reproduction test fails (without the fixes in _unpacker) as:

py/tests/test_string.py:17: 
    def test_string_ext_unpack():
>       assert unpack(b'\x83k\x0b\x00hello world') == \
            [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]

py/erlpack/_unpacker.pyx:30: in erlpack._unpacker.ErlangTermDecoder.loads
    return self.decode_part(bytes, offset + 1)[0]
py/erlpack/_unpacker.pyx:72: in erlpack._unpacker.ErlangTermDecoder.decode_part
    return self.decode_k(bytes, offset + 1)
py/erlpack/_unpacker.pyx:200:
>   st = [ord(x) for x in st]

E   TypeError: ord() expected string of length 1, but int found

Solution

Using a behavior check for determining the handling (byte_elements_are_ints = isinstance(b'a'[0], int)) allows for modifying the translation behavior based on the underlying iterator behavior (so this avoids trying to assert based on system versions). There may be some speed optimizations around this and using this check as a global detector or wrapping the translation into a private function, but this is the least intrusive solution.

Python3 treats an iterator over a bytestring as a collection of ints, so
it doesn't need to cast each char to int using `ord` as in python2.
This adds that compatibility layer and includes a test to capture this
scenario.
@gmjosack gmjosack merged commit a01001b into master Sep 23, 2019
@gmjosack gmjosack deleted the py3/fix-string-ext branch September 23, 2019 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants