`json` parameter now accepts a `bytes` instance for pre-serialized json. #1730

aviramha · 2021-07-01T09:25:01Z

This is a take on #1352 #717 . I think it offers a pretty good way to provide easy functionality, without global hacks.

tomchristie · 2021-07-01T14:00:49Z

Appealing in some ways, but also a bit awkwardly nuanced, given that Python can often be a little careless between when bytes and str is used.

For example, these two examples look pretty identical on the surface...

import httpx, json

httpx.post(url, json=json.dumps(data))

import httpx, ojson

httpx.post(url, json=ojson.dumps(data))

But would actually result in completely different behaviour. (One case happens to output a str, then other bytes)

aviramha · 2021-07-02T04:26:01Z

It's a good point.
How about creating a new class that inherits from bytes for marking this usecase?

class SerializedJSON(bytes):
    pass
httpx.post(url, json=SerializedJSON(json.dumps("asdsad")))

This is a real blocker for us so I'm a bit pushy :p

tomchristie · 2021-07-02T08:22:14Z

This is a real blocker for us

Okay, noted.

So... this PR only addresses the request encoding, but wouldn't alter response.json().
Do you need both?

Also, do you have any kind of metrics that've informed this use case. Eg. how large are the json documents that you're posting? Do you have a good demo example that shows that the serialization is significant vs. the actual network time it takes to upload?

Typically the network would be the limiting factor anyways, but if we've got some good evidence that it's not the case for some reasonable use-cases, then that'd be a valid reason to prioritise #1352 / #717.

aviramha · 2021-07-02T09:10:07Z

We have dictionaries sized 100-500 kilobytes. Those take about 10x the time using regular json, and also converting the complex object to a dictionary by itself takes a huge amount of time, rather then the serialization that does it on the fly. I think any benchmark of dataclass in orjson is a good example for our performance gain from using it.

tomchristie · 2021-07-02T10:29:11Z

I think any benchmark of dataclass in orjson is a good example for our performance gain from using it.

Very possibly. Although it is relevant to know if the JSON serialisation speed is significant vs. the actual network sending itself. Typically the network speed is the actual limiting factor, and improvements in serialisation speed might look relevant if measured in isolation, but might not necessarily be relevant in the bigger picture.

I'm not asserting that is or isn't the case here, but I haven't seen any definitive evidence either one way or the other.

A good test example might be to measure a large JSON request being posted to the httpbin service, and seeing what impact orjson has there, because that'd be a more meaningful metric?...

(I'm absolutely open to being persuaded here, I'm just pushing back until we've really made a the case for it.)

aviramha · 2021-07-02T10:32:47Z

Regarding response.json this can be easily bypassed by using response.content, unlike the request where it creates a very annoying pattern.
Last comment was from phone so I didn't put a good enough example.
Take the following:

import json
import orjson
from dataclasses import dataclass, asdict

@dataclass
class A:
    a: str
@dataclass
class B:
    a: A
    b: A
    c: A
b = B(a=A(a="Asd"), b=A(a="asd"), c=A(a="lasd"))
%timeit orjson.dumps(asdict(b))
11.8 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit json.dumps(asdict(b))
15.6 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit orjson.dumps(b)
479 ns ± 2.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This example shows that it's not about who serializes a dict faster (although orjson does it better) but the fact that serialization happens in one operation, rather then conversion then serialization.

aviramha · 2021-07-02T10:40:06Z

I actually disagree that serialization shouldn't be a factor on it's own even if it's not significant to the network limit. If you can cut 30 ms from either side, why would it matter? Especially when the 30ms (30ms as an example ofc) is CPU bound and not IO bound and you actually save "CPU money"...
In our scenario (microservices communication) the network time is between 1-5ms but serialization of complex objects requires use of a function similar to jsonable_encoder of fastapi that can take on it's own 500ms. It could be we can make a better jsonable_encoder of our own, but still, orjson does the serialization and conversion all in one, reducing a lot of cycles between C & Python.

aviramha · 2021-07-02T10:53:58Z

Here's a benchmark of a local httpbin:

from typing import List
import orjson
import dataclasses


@dataclasses.dataclass
class Member:
    id: int
    active: bool


@dataclasses.dataclass
class Object:
    id: int
    name: str
    members: List[Member]


objects_as_dataclass = [
    Object(i, str(i) * 3, [Member(j, True) for j in range(0, 10)])
    for i in range(100000, 102000)
]

%timeit httpx.post("http://localhost:9001/post", content=orjson.dumps(objects_as_dataclass))
247 ms ± 5.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit httpx.post("http://localhost:9001/post", json=[dataclasses.asdict(obj) for obj in objects_as_dataclass])
371 ms ± 8.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

aviramha · 2021-07-02T10:56:40Z

(I'm absolutely open to being persuaded here, I'm just pushing back until we've really made a the case for it.)

Just to be clear, I'm entirely up for your scrutiny. You're (and the whole encode team) are doing an amazing job in keeping a very high standard set of libraries, and this cannot happen without being strict and following real life examples that are proved and not niche cases, that complicate the codebase. I wish more open source groups would be as professional.

tomchristie · 2021-07-02T10:58:09Z

Fantastic, thanks!

Can we also see the following?...

%timeit httpx.post("http://localhost:9001/post", json=[orjson.dumps(dataclasses.asdict(obj)) for obj in objects_as_dataclass])

It's interesting to me if the upshot of this is that the primary motivation here is "orjson can directly serialize dataclass instances, which is fast". (Which is a somewhat different motivation to "orjson is faster than json")

tomchristie · 2021-07-02T11:04:49Z

I think the sensible outcome of this is likely to be that Client(...) accepts a jsonlib=... argument, and have the Request(...) and Response(...) classes also do the same.

I'm not wild about the global approach in #1352, so if we've got a proven-enough use case here, then let's do this properly, right?

aviramha · 2021-07-02T11:26:38Z

The example you wrote doesn't make sense so I assume you meant for something like this:

%timeit httpx.post("http://localhost:9001/post", content=orjson.dumps([dataclasses.asdict(obj) for obj in objects_as_dataclass]))
346 ms ± 9.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I think a jsonlib argument is not sufficient, as different json libraries might have different options (for example, we use the orjson numpy options that can be passed).
What about accepting json_dumps and json_loads? Maybe a NamedTuple of both functions..?
Let met know what you prefer and I'll alter the PR to match it.

aviramha · 2021-07-02T11:29:54Z

imo the signature should be something like

def json_encode(json: Any) -> bytes

def json_decode(data: Union[str, bytes]) -> Any:

tomchristie · 2021-07-05T09:20:06Z

I think a jsonlib argument is not sufficient, as different json libraries might have different options

The docs on https://github.com/encode/httpx/pull/1352/files#diff-0f0c40de44a0841ed5180f16d66212d1cece2a7d4400789007c2f73bf8432622 detail how you'd be able to handle that.

(Tho #1352 proposes a global approach)

aviramha · 2021-07-05T09:40:56Z

I'll be a bit picky and say that it'll force double encoding.. but ok 😅

tomchristie · 2021-07-05T10:45:39Z

So, all of the proposed options still leave me with awkward niggly-uncomfortablness.

I've also got the seed of some thoughts in #1740 which again is pushing me away from wanting to add extra API in this area.

Right now, the way I'd currently work with your use-case is probably something like this...

class CustomClient(httpx.Client):
    def request(self, *args, **kwargs):
        if 'json' in kwargs:
            kwargs['content'] = orjson.dumps(kwargs.pop('json'))
            kwargs.setdefault('headers', {})
            kwargs['headers']['Content-Type'] = 'application/json'
        super().request(*args, **kwargs)

It's also interesting having talked through your requirements here that the primary benefit orjson is giving you is not the faster json rendering that it provides (which is noticable, but not the biggest factor), but the fact that it provides json rendering directly from dataclass instances. (Which is ~kinda different.)

bentheiii · 2021-07-07T10:02:52Z

I think the sensible outcome of this is likely to be that Client(...) accepts a jsonlib=... argument, and have the Request(...) and Response(...) classes also do the same.

This seems like a good solution, though I think it would be better if the jsonlib argument was divvied into two arguments: json_serializer and json_deserializer. This is how sqlalchemy, pydantic, and flask handle custom serializations.

stale · 2022-02-19T23:27:22Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tscheburaschka · 2022-11-11T12:46:59Z

I know, this might be outdated but I thought I can add one more use case here.
When switching from requests to httpx for sending time-series data I realized, that the JSON-encoder happily allows NaN and infinity as float values (basically by not altering the default value of the allow_nan parameter in stdlibs json.dump) in contrast to requests standard behavior.
So having an easy way to control the options of json encoding (even if not changing the library altogether) would be a convenient feature here. Until now I have to do nasty patching or heavy-weight sub-classing adding more stack layers.

json parameter now accepts a bytes instance for pre-serialized json.

23f5234

aviramha force-pushed the json_as_encoded branch from 08f7056 to 23f5234 Compare July 1, 2021 09:31

danrobinson88 approved these changes Jul 1, 2021

View reviewed changes

stale bot added the wontfix label Feb 19, 2022

stale bot closed this Feb 27, 2022

Kludex reopened this Dec 29, 2022

Kludex added wontfix and removed wontfix labels Dec 29, 2022

Kludex closed this Dec 29, 2022

zanieb mentioned this pull request Aug 1, 2023

Custom JSON library #717

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`json` parameter now accepts a `bytes` instance for pre-serialized json. #1730

`json` parameter now accepts a `bytes` instance for pre-serialized json. #1730

aviramha commented Jul 1, 2021

tomchristie commented Jul 1, 2021

aviramha commented Jul 2, 2021 •

edited

tomchristie commented Jul 2, 2021

aviramha commented Jul 2, 2021

tomchristie commented Jul 2, 2021 •

edited

aviramha commented Jul 2, 2021

aviramha commented Jul 2, 2021

aviramha commented Jul 2, 2021

aviramha commented Jul 2, 2021 •

edited

tomchristie commented Jul 2, 2021 •

edited

tomchristie commented Jul 2, 2021

aviramha commented Jul 2, 2021

aviramha commented Jul 2, 2021

tomchristie commented Jul 5, 2021

aviramha commented Jul 5, 2021

tomchristie commented Jul 5, 2021

bentheiii commented Jul 7, 2021

stale bot commented Feb 19, 2022

tscheburaschka commented Nov 11, 2022

json parameter now accepts a bytes instance for pre-serialized json. #1730

json parameter now accepts a bytes instance for pre-serialized json. #1730

Conversation

aviramha commented Jul 1, 2021

tomchristie commented Jul 1, 2021

aviramha commented Jul 2, 2021 • edited

tomchristie commented Jul 2, 2021

aviramha commented Jul 2, 2021

tomchristie commented Jul 2, 2021 • edited

aviramha commented Jul 2, 2021

aviramha commented Jul 2, 2021

aviramha commented Jul 2, 2021

aviramha commented Jul 2, 2021 • edited

tomchristie commented Jul 2, 2021 • edited

tomchristie commented Jul 2, 2021

aviramha commented Jul 2, 2021

aviramha commented Jul 2, 2021

tomchristie commented Jul 5, 2021

aviramha commented Jul 5, 2021

tomchristie commented Jul 5, 2021

bentheiii commented Jul 7, 2021

stale bot commented Feb 19, 2022

tscheburaschka commented Nov 11, 2022

`json` parameter now accepts a `bytes` instance for pre-serialized json. #1730

`json` parameter now accepts a `bytes` instance for pre-serialized json. #1730

aviramha commented Jul 2, 2021 •

edited

tomchristie commented Jul 2, 2021 •

edited

aviramha commented Jul 2, 2021 •

edited

tomchristie commented Jul 2, 2021 •

edited