Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 'httpx.json = ...' #1352

Closed
wants to merge 13 commits into from
Closed

Support 'httpx.json = ...' #1352

wants to merge 13 commits into from

Conversation

tomchristie
Copy link
Member

Closes #717

Interested to see what folks think of this.
Adds support for eg...

import httpx
import orjson

# Globally switch out the JSON implementation used by httpx.
httpx.json = orjson

@tomchristie tomchristie added the enhancement New feature or request label Oct 8, 2020
@tomchristie tomchristie mentioned this pull request Oct 8, 2020
Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change footprint here sounds reasonable. :)

This won't allow per-client customization of JSON lib, which for an end user scenario doesn't sound necessary anyway, so all good.

However I'm not sure how this will play out in a stack of libraries scenario. I guess we want libraries to absolutely stay away from this knob, so that end users always remain the sole decision makers of what JSON library they want to use?

@tomchristie
Copy link
Member Author

Interesting point here...

The orjson.dumps(...) API returns bytes, while stdlib json.dumps() returns str.

To migrate from the standard library, the largest difference is that orjson.dumps returns bytes and json.dumps returns a str. Users with dict objects using non-str keys should specify option=orjson.OPT_NON_STR_KEYS. sort_keys is replaced by option=orjson.OPT_SORT_KEYS. indent is replaced by option=orjson.OPT_INDENT_2 and other levels of indentation are not supported.

If we're going to go for this approach we'd probably want to support either case.

httpx/_content.py Outdated Show resolved Hide resolved
@eseglem
Copy link

eseglem commented Oct 8, 2020

I was concerned about being able to overload each function independently because orjson returns bytes, as . I had initially thought something like this might be better:

import httpx
import orjson

# From: https://pydantic-docs.helpmanual.io/usage/exporting_models/#custom-json-deserialisation
def orjson_dumps(v, *, default):
    # orjson.dumps returns bytes, to match standard json.dumps we need to decode
    return orjson.dumps(v, default=default).decode()

# Globally switch out the JSON implementation used by httpx.
httpx.json_loads = orjson.loads
https.json_dumps = orjson_dumps

But #1344 (comment) shows how to do it as a class. Seems reasonable to do it that way.

If you wrap the dumps() in an ensure_bytes function, it would remove the need for the special orjson function though. Since the body needs to end up bytes anyways, right? Does it make sense to put that logic in to httpx? I would like to see it, but it seems to be a special case just for orjson. Would like to prevent the need to bytes > str > bytes though.

@dmig-alarstudios
Copy link

I'd recommend cherry-picking commit 16cac58, as unneeded decode-encode slows down encode operation around 6-8 times on my machine.

@cdeler
Copy link
Member

cdeler commented Oct 11, 2020

I'm sorry for that, looks like other reviewers already mentioned that, but having replaced json by X-lib, you relied that X provides you with the same interface

I'd say you might explicitly mention that , specifying the abstract marshaller (with one default implementation which use json)

Any httpx user can use their jsonificator, implementing the interface.

There is another question (may be naive). Specifying active json marshaller globally, you indirectly pass this dependency to all the created clients. The question is: may it be reasonable to pass json-marshaller as an optional kwarg to an (Async)Client (with default StdlibJsonMarshaller?

Update:
@florimondmanca wrote

This won't allow per-client customization of JSON lib, which for an end user scenario doesn't sound necessary anyway, so all good.

so the second question had been answered before I wrote it

@cdeler
Copy link
Member

cdeler commented Oct 21, 2020

Not to be "just reject, not propose" , I created a gist with my proposal how to implement httpx.json

https://gist.github.com/cdeler/47dd0cc83c1fa0807aa7912cfa322f9e

If you want to use your own json library, you need to inherit JsonMarshaller, then set httpx.json.json = YourJsonMarshallerClass() (well, naming is not brilliant)

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Some extra docs for this maybe? Happy to review those as a follow-up, but I figure we might as well add a small section to Advanced Usage in this PR…

@florimondmanca
Copy link
Member

Just pushed some docs. Curious if anyone has any comments, otherwise I'm good with merging this. :-)

@dmig-alarstudios
Copy link

2 typos in the second example, nothing else otherwise

docs/advanced.md Outdated Show resolved Hide resolved
docs/advanced.md Outdated Show resolved Hide resolved
@tomchristie tomchristie mentioned this pull request Nov 30, 2020
3 tasks
@gtors
Copy link

gtors commented Feb 10, 2021

Ad-hoc solution for those of you who are waiting for this feature:

import datetime
import functools
import json

try:
    import pydantic
except ImportError:
    pydantic = None

def _default(obj):
    # For Date Time string spec, see ECMA 262
    # https://ecma-international.org/ecma-262/5.1/#sec-15.9.1.15
    if isinstance(obj, datetime.datetime):
        representation = obj.isoformat()
        if representation.endswith("+00:00"):
            representation = representation[:-6] + "Z"
        return representation
    elif pydantic and isinstance(obj, pydantic.BaseModel):
        return obj.dict()
    elif isinstance(obj, datetime.date):
        return obj.isoformat()
    elif isinstance(obj, datetime.time):
        return obj.isoformat()

    raise TypeError(f"Can't serialize {obj}")

def patch_httpx():
    import httpx._content

    httpx._content.json_dumps = functools.partial(json.dumps, default=_default)

if __name__ == '__main__':
    patch_httpx()
    # ... init your app here ...

@tomchristie tomchristie added this to the v0.17 milestone Feb 17, 2021
@tomchristie
Copy link
Member Author

Added the 0.17 milestone for this, so we can review having it in for that (or not).

@florimondmanca florimondmanca removed this from the v0.17 milestone Feb 28, 2021
@lundberg
Copy link
Contributor

lundberg commented Mar 3, 2021

Kind of late to the discussion, but this one aligns with my suggestion in #1362 about adding support for global configuration to httpx, e.g httpx.configure(mounts=..., jsonlib=..., ...).

@darkevilmac
Copy link

I've recently started switching over some of my codebase from requests and moving to HTTPX - only thing that's been bugging me has been the explicit calls to orjson all over the place.

Is this something that's still being considered or are there no plans to merge this any time soon?

Just don't want to be putting things that are too hacky into production code anytime soon so having a supported way to do this would be great.

@tomchristie
Copy link
Member Author

tomchristie commented May 21, 2021

I don't know. I've been leaving it here because I'm not super keen on it, since it really is a sneaky little global hack.

I'm also not necessarily convinced that allowing our users to switch out the JSON parsing implementation is a path we want to help folks go down, since you're making a trade-off on "various differences in edge cases of behaviour" vs. "raw parsing speed is quicker", where it's very often not the case that "quicker parsing" is actually materially important. (Because eg. you're likely actually constrained by the network in the HTTP request, rather than the JSON parsing.)

I suppose having a real-world example case of "this HTTP request + JSON parsing round-trip takes X-amount-of-time if we allow support orjson patching, and takes Y-amount-of-time if we don't" might be a good way of persuading me that there's a genuinely important use-case that outweighs the somewhat awkward global approach.

@darkevilmac
Copy link

darkevilmac commented May 21, 2021

Unfortunately I can't give a super large amount of details, but in very generalized terms the codebase I work with passes around large amounts of data between different internal and external APIs. So any time I can save in the serialization and de-serialization of data ends up adding up pretty quick for me. That was the main reason I switched to an alternative json lib in the first place,

The current solution of just manually calling orjson is fine - it would just be nicer to have something a bit cleaner.

@aviramha
Copy link

aviramha commented Jul 1, 2021

@tomchristie the main annoyance for me of explicitly calling orjson on my own, is that I also need to set content type. Maybe the solution should be that json accepts bytes? if it's bytes, it assumes it's an already encoded json.

@tomchristie
Copy link
Member Author

the main annoyance for me of explicitly calling orjson on my own, is that I also need to set content type.

One alternate pattern that we could highlight as an option...

c = httpx.Client(headers={"Content-Type": "application/json"})

Then whenever you're sending json, you'd use...

r = c.post(url, content=ojson.dumps(data))

@aviramha
Copy link

aviramha commented Jul 1, 2021

But then it would mark also get requests with that content type, no?

@tomchristie
Copy link
Member Author

Fair point.

@dreinon
Copy link

dreinon commented Dec 11, 2021

Hi! May I ask what's the status of this? We switched from requests to httpx in the supabase python client supabase-py, and since requests has preference for simple-json lib if installed, then installing that library we could allow serializing objects of types as decimal.Decimal for instance.
Since it was updated to httpx, there is no preference anymore and users won't be able to serialize this. I personally would find interesting to allow this preference, or at least, be able to specify a custom json lib like this PR proposes.

Thanks!

@stale
Copy link

stale bot commented Feb 19, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 19, 2022
@stale stale bot closed this Feb 27, 2022
@acnebs
Copy link

acnebs commented Nov 28, 2022

I don't know. I've been leaving it here because I'm not super keen on it, since it really is a sneaky little global hack.

I'm also not necessarily convinced that allowing our users to switch out the JSON parsing implementation is a path we want to help folks go down, since you're making a trade-off on "various differences in edge cases of behaviour" vs. "raw parsing speed is quicker", where it's very often not the case that "quicker parsing" is actually materially important. (Because eg. you're likely actually constrained by the network in the HTTP request, rather than the JSON parsing.)

I suppose having a real-world example case of "this HTTP request + JSON parsing round-trip takes X-amount-of-time if we allow support orjson patching, and takes Y-amount-of-time if we don't" might be a good way of persuading me that there's a genuinely important use-case that outweighs the somewhat awkward global approach.

I think this reasoning is a bit backwards. It's fairly hard to know where bottlenecks are going to be, and it seems clear from this discussion that nobody has actually thoroughly benchmarked this against many different possible use-cases. Given that is the case, not giving users the escape hatches they need to adjust to such bottlenecks when they find them (with their own testing) seems like poor library design.

As a fairly clear example of this: the CPython json lib ain't fast, so there are definitely situations (e.g. making a request localhost with 50MB of deeply nested JSON) where json is going to be much slower than the actual network request.

FWIW I just now decided to not pick httpx over aiohttp even though I wanted to because this is not an option in the lib.

@chickeaterbanana
Copy link

IMHO the biggest feature of overwriting json library isn't so much the performance it is more the flexibility. For example add the capability to dump/load Decimal (the standard lib json can do this as well with a parameter cls but without overwriting the default encode/decode function of httpx its not possible to use this feature of the standard lib). The same is true for uuid, datetime and of cause all user-defined types.

@Kludex Kludex reopened this Dec 29, 2022
@Kludex Kludex removed the wontfix label Dec 29, 2022
@Kludex Kludex closed this Dec 29, 2022
@Kludex Kludex added the wontfix label Dec 29, 2022
@tomchristie tomchristie deleted the override-json branch December 29, 2022 16:47
@xbeastx xbeastx mentioned this pull request Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom JSON library