-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON extra uses orjson instead of ujson #599
Conversation
setup.py
Outdated
@@ -98,7 +98,7 @@ def extra(self): | |||
'dataclasses>=0.6;python_version<"3.7"' | |||
], | |||
extras_require={ | |||
'ujson': ['ujson>=1.35'], | |||
'json': ['orjson>=2.0.6,<3;platform_python_implementation=="CPython"'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested this on PyPy, but the intent is to allow users to unconditionally install pydantic[json]
without an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested this on PyPy, but the intent is to allow users to unconditionally install
pydantic[json]
without an issue.
I think this might be confusing -- it might make someone think that for any json support they need to install pydantic[json]
. I would vote to have it as 'orjson': ['orjson>=2.0.6,<3;platform_python_implementation=="CPython"']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. The intent was the target won't change if the implementation does. I've changed it to orjson.
Codecov Report
@@ Coverage Diff @@
## master #599 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 15 15
Lines 2628 2628
Branches 516 516
=====================================
Hits 2628 2628 |
Looks good in principal, Thoughts/questions:
I'm guessing the date & datetime output format might change which could break things for lots of people. Is the performance improvement worth that? |
orjson can be used to serialize as well (if not pretty-printing) but it requires outputting It looks like there's actually no difference in datetime output. datetime's isoformat output seems the same as RFC 3339 and pydantic's tests pass either way. In terms of installing by default, the requirement in setup.py should be: orjson>=2.0.6,<3;platform_python_implementation=="CPython" and platform_machine=="x86_64" and python_version<"3.8" PyPy, ARM, and 3.8 could fall back to json. I think that would work fine. There are PyPI wheels for Linux, macOS, and Windows for 3.5, 3.6, and 3.7. |
All makes sense to me. Given that orjson doesn't work with some platforms, let's leave it as an optional dependency for now. We should remove I'm less sure about returning bytes vs. str - I understand there would be a performance improvement to using bytes, specially as in most contexts bytes will be required in the end. But this would be a breaking change and very unnatural for lots of people who are used to |
I mostly agree with @samuelcolvin, with the following caveats:
|
Ok, well is there anything further that should be done as part of this PR? |
Yes. On the name of the optional dependency let's use "
Let's change
|
Nice! I think it would be interesting to also support ORJSON. Agree with you both. With the two concerns/caveats from @dmontagu . |
I've renamed the extra to |
This pull request introduces 1 alert when merging 156b77c into 3cdbbae - view on LGTM.com new alerts:
|
This is the minimum change to replace ujson. There are additional calls to json.loads that may be modified. The extra is renamed 'fast-json' to match 'email' and avoid further changes.
hi, I think it would be best that we fix everything at the same time on the same PR, would you be willing to implement the features requested above? |
After further consideration, I have some other concerns about using orjson in pydantic:
Individually all these things are fine and understandable but together they concern me. Let me make it clear: I'm not accusing @ijl of anything, I'm 99% certain that his/her intentions are honourable. However given all the pointers above, I can't help articles like this coming to mind. I understand that (mostly through fastAPI) pydantic is used by some fairly big organisations, including Microsoft, making it a target for hackers. I don't want to be (inadvertently) responsible for a security breach, or do something which increases that possibility unduely. I therefore won't be accepting this PR. Sorry. |
I'm at a loss. That's an extravagant claim I would never have expected. |
I think this decision needs to be reviewed. ujson's last release was 3.5 years ago. The fastest and better maintained python library is orjson right now. I think it would be quite beneficial to the community if there was a blessed way to use it on pydantic & FastAPI @samuelcolvin @tiangolo |
I'd still be prepared to migrate to orjson but somehow my concerns above would need to be resolved. @pablogamboa, other than a long delay since last release do you have any specific problems with ujson? |
@samuelcolvin the biggest problem for me with ujson is that you cannot extend it and it has a so-or-so support for floats (you can see they have a few issues open around it). orjson with its I think that there's a very interesting synergy between the Rust and Python communities and that orjson is among the first packages that will dominate Python because of Rust's ridiculous performance. Yes ilj is relatively unknown in the python community, but he's really onto something with orjson. |
I 100% agree that orjson looks great, that's why I created #589. I'm also entirely with you on the synergy between rust and python; there was a great talk on this at euro-python this year from a core python maintainer. If I was starting pydantic from scratch today I'd definitely use orjson. However there's a higher threshold of confidence required to migrate to a package when it's already being downloaded 200k/mo. Many users don't have time to go through every dependency change in every package update, so they have to outsource that part of security to package maintainers and trust them to make sensible decisions. I simply can't switch to orjson without some more openness from the maintainer or a (kind of hostile) fork which I'm not prepared to do. In summary nothing has changed since my comment above, so neither has my decision. |
Yep understood, I didn't ask you to switch tho! I asked for a blessed way to use it, i.e. make it somehow configurable to the user what json library to use. That'd be an amazing feature. For example Flask's JSONEncoder is great for an easy way to use a custom encoder in the framework (and my biggest grip with ujson!) |
makes sense. That sounds like it might be possible. Perhaps using orjson if a specific environment variable is set? |
An env variable would definitely work, another approach could be an api that allows you to override dumps/loads somehow. This would be a deadly addition! thanks for listening @samuelcolvin |
makes more sense, perhaps we could have |
This is the minimum change to replace ujson. There are
additional calls to json.loads that may be modified.
The extra is renamed 'fast-json' to match 'email' and avoid further
changes.
#589