New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement encode/decode hooks that allow to define custom serialization formats #209
Conversation
This commit introduces several hooks for encoding/decoding functions: - ujson.encode(pre_encode_hook=func) - ujson.decode(object_hook=func) - ujson.decode(string_hook=func) These hooks may be used for defining custom JSON representations for arbitray object types. The pre_encode_hook is similar to the standard JSONSerializer.default() method. But unlike the default() method, the pre_encode_hook() is called before any other encoding attempts, and thus the hook allows to override some ujson behavior, like the __json__() method or datetime format. The object_hook semantics is similar to the standard json.loads(object_hook). It allows to deserialize JS objects into specific Python class instances instead of generic dictionaries. The string_hook is like object_hook, but called for every parsed string, and thus it allows to deserialize strings into Python objects. This may be useful for dates and other objects commonly represented as strings. This is the very first prototype implementation, so more commits will follow.
Hi Dmitry, Thanks for the PR. Have you made pre and post benchmarks when this is not used? |
Just made the benchmarks, and got the following results: Python2/pre: Python2/post: Python3/pre: Python3/post: Full logs are available here: The difference is less than 1%, so I can conclude that there is no significant impact on the performance. |
problem: basically the NULL pointer dereference When an exception happens in a Python function, you get NULL from PyObject_CallFunctionObjArgs() (instead of a valid PyObject pointer). The value then is dereferenced and you get in trouble. So this commit adds the NULL check with corresponding unit tests. Also it adds few minor fixups: - Don't execute internal ujson's API callbacks (callPreEncodeHook() and callObjectHook()) when these hooks are not actually passed from the Python code. This is a very minor optimization, saves one C function call per object. - Get rid of the GET_EP() macro. It is used only once and therefore is redundant.
The pre_encode_hook() is called for every single JSON value. This is slow, especially if you have a lot of objects and complicated hook function. This performance overhead may be reduced if you don't call the pre_encode_hook() for primitive types like Number and String (since usually you don't need to handle these types). So this commits makes this behavior default. By default you get good performance, but the hook is not called for primitive types. and this behavior can be disabled like this: ujson.encode(pre_encode_primitive=True)
I think we should just then change |
Or |
👍 I would really like to use |
Excellent!
|
Why this PR is not merged yet? Is anybody here??? |
Hello? |
Hi guys. I have bad news: these changes should NOT be used, because there is a memory leak in there. I didn't investigate it properly, but what I know is that recently we discovered a memory leak on a project where ujson (with this custom-hooks-feature) is involved. And basically we disabled ujson completely and the problem gone. And since ujson was previously tested by a lot of users, and there was no memory leak, then I can conclude that the problem is introduced by me. So this pull request is buggy. Don't use it in production. And unfortunately, I'm not going to fix it, because priorities are changed, and we don't use ujson anymore. So I don't have any real project where I could test it propertly (and I don't have time to join new projects). So guys, sorry, but I'm closing this pull request. |
This pull request proposes encoding/decoding hooks, that is,custom Python functions that you may pass to
ujson.encode()
/ujson.decode()
and transform objects on the fly during the encoding/decoding process.Basically this is a customization feature. Especially useful to override certain behavior already implemented by the ujson code, like
datetime
serialization or__json__()
method handling (see below).The hook behavior is similar to
json.dumps(default=fn)
andjson.loads(object_hook=fn)
semantics, you just pass a function that takes an object and possibly replaces it with another object. But unlike thedefault
hook (that is called at the end) I implemented thepre_encode_hook
(that is called before any other transformations are made), and this is done intentionally to allow user to override certain ujson's behavior inside the hook.See the
README.rst
andtests.py
for usage examples.Motivation:
Customization features are demanded for long time, see Add support for custom encoders #124, Support for custom date serialization? #139.
People are making their own forks and do minor changes in the C code to customize serialization for various types. That should be solved by extending the Python API.
Hooks may be a good tradeoff solution (not as good as an object-oriented API where you can override methods and so on, but still ok).
The current ujson's implementation translates
date
anddatetime
objects into timestamps, and there is no option for switching to another format (like ISO 8601).Actually the format is not defined by JSON, so any hardcoded implementation is doomed.
I guess the pull request Serialise datetime and date objects #201 is related here: it defines another hard-coded format, so it doesn't solve this problem, but still may be useful as a performance optimization for this specific format.
The
__json__()
method also has a problem: ujson expects this method to return a string containing valid JSON, while the usual approach (or at least how I know it) is to return an object like a dictionary from the__json__()
method.Recently I had noticeable issues because of that when I tried to switch to ujson on one of projects. And the
toDict()
method is not always the solution, since you don't always return a dictionary. And I think generally the ujson should try to be compatible with already existing code (ideally, a drop-in replacement), so there should be an option for changing the__json__()
semantics without changing tons of ORM classes that implement this method.And if you change the ujson's behavior to this "usual" approach, you will break the backwards compatibility. Again, there is no standard option here, so it would be better to let people to override this behavior with an arbitrary implementation.
So, now I'm asking someone to review the changes and make some comments.
Especially pay some attention to reference counting, I'm not sure that I handled it correctly.
Thanks!