Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when using MsgPackSerializer #834

Open
nottmtt opened this issue Apr 1, 2024 · 7 comments
Open

UnicodeDecodeError when using MsgPackSerializer #834

nottmtt opened this issue Apr 1, 2024 · 7 comments

Comments

@nottmtt
Copy link

nottmtt commented Apr 1, 2024

Hi!

When using MsgPack as the (de)serializer for aiocache (no matter the backend), the following error occurs when reading an item from the cache:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 0: invalid start byte

(I'm using Python 3.12.2, aiocache 0.12.2, aiomcache 0.8.1, redis 4.6.0 and msgpack 1.0.8.)

The error seems to stem from the fact that most backends (prematurely?) decode values from the cache before passing them to the (de)serializer (see backends/memcached.py:24). Replacing this line with return value makes it work (for MsgPack, at least).

But I guess simply removing the value decoding logic isn't a solution. If anyone can give me a clue of what's going wrong here (besides the value.decode()), I'll gladly create a merge request with (my attempt at) a fix.

Minimal test case (using MemcachedCache, use RedisCache if you wish to):

import asyncio

from aiocache               import MemcachedCache
from aiocache.serializers   import MsgPackSerializer

async def test():
    cache = MemcachedCache(
        serializer = MsgPackSerializer(),
    )

    await cache.set("test_case", { "key1": "test", "key2": 3 })

    test_case = await cache.get("test_case") # UnicodeDecodeError here
    print(test_case)

if __name__ == "__main__":
    asyncio.run(test())

Full error:

Traceback (most recent call last):
  File "/home/spike/aiocache_test.py", line 17, in <module>
    asyncio.run(test())
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/spike/aiocache_test.py", line 13, in test
    test_case = await cache.get("test_case")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spike/.cache/pypoetry/virtualenvs/api--KCnhHZq-py3.12/lib/python3.12/site-packages/aiocache/base.py", line 64, in _enabled
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spike/.cache/pypoetry/virtualenvs/api--KCnhHZq-py3.12/lib/python3.12/site-packages/aiocache/base.py", line 48, in _timeout
    return await asyncio.wait_for(func(self, *args, **kwargs), timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/spike/.cache/pypoetry/virtualenvs/api--KCnhHZq-py3.12/lib/python3.12/site-packages/aiocache/base.py", line 78, in _plugins
    ret = await func(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spike/.cache/pypoetry/virtualenvs/api--KCnhHZq-py3.12/lib/python3.12/site-packages/aiocache/base.py", line 199, in get
    value = loads(await self._get(ns_key, encoding=self.serializer.encoding, _conn=_conn))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spike/.cache/pypoetry/virtualenvs/api--KCnhHZq-py3.12/lib/python3.12/site-packages/aiocache/backends/memcached.py", line 77, in _get
    return value.decode(encoding)
           ^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 0: invalid start byte

Thanks a lot in advance :)

@nottmtt
Copy link
Author

nottmtt commented Apr 1, 2024

By the way, I also tried to use encoding = None when initializing MemcachedCache (or RedisCache) which works, but reading from the cache gives back bytes instead of strings (which is kind of annoying for most use-cases).

@Dreamsorcerer
Copy link
Member

Possibly the same as #515?

If not, then it'd probably be a good idea to create a test in a PR that reproduces the problem in CI.

@nottmtt
Copy link
Author

nottmtt commented Apr 1, 2024

Possibly the same as #515?

Yeah, #515 is probably related, but UnicodeDecodeError isn't raised when using PickleSerializer instead of MsgPackSerializer. See for yourself:

import asyncio

from aiocache               import MemcachedCache
from aiocache.serializers   import PickleSerializer

async def test():
    cache = MemcachedCache(
        serializer = PickleSerializer(),
    )

    await cache.set("test_case", { "key1": "test", "key2": 3 })

    test_case = await cache.get("test_case")
    print(test_case)

if __name__ == "__main__":
    asyncio.run(test())

This code works fine.

If not, then it'd probably be a good idea to create a test in a PR that reproduces the problem in CI.

I'll try and add that.

@nottmtt
Copy link
Author

nottmtt commented Apr 1, 2024

Acceptance tests for MsgPackSerializer added (#836).

Please let me know if you have any insight on the issue, I'd be glad to help.

@Dreamsorcerer
Copy link
Member

I guess the design currently is that a serialiser should use strings? Though I don't understand how this serialiser got added when it doesn't work with anything... (the original tests don't actually test it with any caches: https://github.com/aio-libs/aiocache/pull/370/files#diff-14d53e5086ad36720b5073972f662da656b46898f2f8bdcdd06bb2fd9c2df9f5R86).

I guess the simple fix for now is to add .encode()/.decode() into the serialiser? So that it works with strings.

Then maybe a better refactoring can be done to avoid the overhead in future, though it's probably best we get #512 done before tackling that (which also requires some other tasks to be complete before that can be done: https://github.com/aio-libs/aiocache/milestone/8).

@nottmtt
Copy link
Author

nottmtt commented Apr 2, 2024

I tried adding the following to MsgPackSerializer's dumps/loads methods, but the UnicodeDecodeError is still raised. Is that the workaround you had in mind?

diff --git a/aiocache/serializers/serializers.py b/aiocache/serializers/serializers.py
index 39a67e6..a3fcce9 100644
--- a/aiocache/serializers/serializers.py
+++ b/aiocache/serializers/serializers.py
@@ -184,7 +184,7 @@ class MsgPackSerializer(BaseSerializer):
         :param value: obj
         :returns: bytes
         """
-        return msgpack.dumps(value)
+        return msgpack.dumps(value).decode("utf-8")

     def loads(self, value):
         """
@@ -196,4 +196,4 @@ class MsgPackSerializer(BaseSerializer):
         raw = False if self.encoding == "utf-8" else True
         if value is None:
             return None
-        return msgpack.loads(value, raw=raw, use_list=self.use_list)
+        return msgpack.loads(value.encode("utf-8"), raw=raw, use_list=self.use_list)

@Dreamsorcerer
Copy link
Member

Yeah, that's what I was thinking. Thinking about it more, the problem is probably that we are decoding the raw bytes first, and that is actually in some binary form, so the decode operation won't work. I guess we'd need some way to disable the decoding (or make the decoding the responsibility of the serialiser maybe?). I suspect such changes won't be backwards compatible and will need to go to v1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants