Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP REST bridge should use Autobahn's binary enabled JSON serializer #2089

Open
daniel-heg opened this issue Aug 21, 2023 · 11 comments
Open

Comments

@daniel-heg
Copy link

Hi,
first of all, thank you for Crossbar and the whole WAMP Protocol.

I have a problem when I try to call wamp.session.get via the HTTP Caller with the following request:

{
    "procedure": "wamp.session.get",
    "args": [<someid>]
}

The call results in a network timeout and the following error log on the router:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/crossbar/bridge/rest/common.py", line 233, in render
    return self._render_request(request)
  File "/usr/local/lib/python3.9/site-packages/crossbar/bridge/rest/common.py", line 480, in _render_request
    d = self._process(request, event)
  File "/usr/local/lib/python3.9/site-packages/crossbar/bridge/rest/caller.py", line 60, in _process
    return d.addCallbacks(on_call_ok, on_call_error)
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 477, in addCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/usr/local/lib/python3.9/site-packages/crossbar/bridge/rest/caller.py", line 50, in on_call_ok
    body = dump_json(res, True).encode('utf8')
  File "/usr/local/lib/python3.9/site-packages/crossbar/_util.py", line 84, in dump_json
    return json.dumps(obj, separators=(',', ':'), ensure_ascii=False)
  File "/usr/local/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/usr/local/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
builtins.TypeError: Object of type bytes is not JSON serializable

I am not sure what classname is is being serialized.
According to the example in the source here there are no obvious classnames to me.

I am using a dynamic authenticatior similar to the provided examples, if that helps.

@oberstet
Copy link
Contributor

Hi!

The call results in a network timeout and the following error log on the router:

ok, that's not ok;)

I am not sure what classname is is being serialized.

class bytes - which is a bit strange since the result returned from the meta API isn't just 1 bytes object, but a structured one.

IOW, it's important to recognize, you are using 2 advanced features in combination: WAMP meta API and accessing that WAMP API over HTTP/REST via the bridge.

that should work of course - but you run into an issue.

the 1st piece is the WAMP meta API itself, which is here:

def session_get(self, session_id: int, details=None) -> Optional[Dict[str, Any]]:

I would hence first test your exact situation and exact session ID which you want to get info on using the WAMP meta API over regular WAMP - take out the REST bridge as source of error. and check if the returned information contains any binary items.

what client language are you using? you can use/adjust this python example:

https://github.com/crossbario/crossbar-examples/blob/master/metaapi/monitor-sessions.py

@daniel-heg
Copy link
Author

Thanks for the answer and the example!

I could finally determine my problem, which was entirely my fault.
You were right, I was passing a bytes object to the authextra dict with my custom authenticator, which then in return could not be serialized via json.dumps().
In my case it was entirely unneccesary to pass it as a bytes object.

I found the culprit in the end by overwriting the JSONEncoder default encoder via json.JSONEncoder.default = ....
If anyone needs a very quick and dirty way to make this work with non-string values inside authextra, then that method might be a way.

However, a catch of these serialization errors is probably necessary, so that no network timeout occurs on the client side.

Thanks again!

@oberstet
Copy link
Contributor

oberstet commented Aug 21, 2023

I found the culprit in the end by overwriting...

ahhh, right, ok! this is interesting .. because:

you might or might not have noticed that WAMP itself (both in Crossbar.io and in Autobahn) is able to use binaries in JSON (wamp.2.json) - and this works transparently

it will automatically translate binary values to hex encoded strings for app payload items - or also authextra

the code for this autotranslation is here:

https://github.com/crossbario/autobahn-python/blob/master/autobahn/wamp/serializer.py#L436

this JSON binary autotranslation is described in the WAMP spec here https://wamp-proto.org/wamp_ap_latest_ietf.html#name-binary-support-in-json

BUT: for reasons unclear / historical the REST bridge in Crossbar.io does NOT use above function, but uses:

from crossbar._util import dump_json

def dump_json(obj, minified=True):


so to sum up, the actual bug is: the REST bridge should use the Autobahn JSON serializer!

but in addition to that root issue, I also agree with

However, a catch of these serialization errors is probably necessary, so that no network timeout occurs on the client side.

the exception should be caught in the bridge, and an "internal error" http 500 should result

@oberstet oberstet changed the title wamp.session.get call via HTTP Bridge Caller results in error HTTP REST bridge should use Autobahn's binary enabled JSON serializer Aug 21, 2023
@daniel-heg
Copy link
Author

Maybe nice to know for anyone running into this problem:
I had to reapply my hack with TLS enabled, because args[0]["transport"]["channel_id"]["tls-unique"] is also a bytes object.

@oberstet
Copy link
Contributor

oberstet commented Aug 21, 2023

ok, I've dug a bit deeper:

  1. what I wrote rgd the REST bridge should - in general - use Autobahn's serializer. this is correct, but this is already filed here (so this issue is a actually a duplicate): REST bridge should use our own JSON serializer #1079

  2. however, while application payloads MAY use binary values, and for such payloads the JSON auto-translation to hex encoding should be done (hence issue REST bridge should use our own JSON serializer #1079), this is NOT the case for elements of WAMP message outside the application payload MUST only use these types (which don't include binary!): https://wamp-proto.org/wamp_bp_latest_ietf.html#section-2.2-2


thus: if you had used a binary value inside authextra, that was wrong in the first place, and the question hence is: why doesn't the router catch that already when authenticating, denying the authenticating client altogether.


my lesson from all of this is (again): we've reached a point where the protocol and implementation is complex enough that we really need more test coverage for all these things;)

@oberstet
Copy link
Contributor

I had to reapply my hack with TLS enabled, because args[0]["transport"]["channel_id"]["tls-unique"] is also a bytes object.

everything outside application payloads (of calls, events, ..) in WAMP messages on the wire must not use binary. I've pinged @ecorm to double check. if so, and if it goes out onto the wire in crossbar, then this is also a bug:(

@ecorm
Copy link

ecorm commented Aug 21, 2023

It depends on how one interprets the terms types and string in this Section 2.2 passage:

A message serialization format is assumed that (at least) provides the following types:

  • integer (non-negative)
  • string (UTF-8 encoded Unicode)
  • bool
  • list
  • dict (with string keys)

WAMP itself only uses the above types...

NUL is a valid character that may appear anywhere in a Unicode string. So when we prefix a NUL character to Base64-encoded binary data, the result is still a valid UTF-8 string.

If "string" means any "UTF-8 string appearing over the wire", then the special way that WAMP converts binary data into strings for JSON serialization could be considered a "string" type.

If "type" means the type at the WAMP layer after performing UTF-8-to-binary-data conversions, then I guess that would exclude the possibility of using binary data in the non-payload parts of WAMP messages.

@oberstet
Copy link
Contributor

thanks for checking, and for your thoughts!

It depends on how one interprets .. this Section 2.2 passage:

agreed, I was afraid that might be the case;)

what is also missing (unless I overlooked it in the current spec text) is any explanation / motivation for this, which might give the reader additional context to disambiguate, which is one reason I am big fan of "use cases" and "motivations" and "goals" in tech specs.

IOW: we are missing sth in the spec.


having said that, here is some original motivation:

all WAMP message elements other than arg + kwargs (the actual app payload) are "meta" and "control" in some way

and hence those WAMP message elements might need to be or potentially can be interpreted by e.g. a router

and hence are kinda "important", and it makes sense to keep things simple and "least common denominator" from serialization PoV, and remove potential issues in that part

now that is the motivation and goal.

here is the "means" of achieving that: only use types which are directly present in all serializers, and only types which perfectly round-trip across for a single serializer and between serializers

and those types mentioned in section 2.2. are exactly that.

e.g. even though some serializers directly support binary types, not all do of course (well, and our JSON binary encoding is working perfectly, it is not a commonly used standard outside WAMP)

here is another one: IEEE 754 floats do NOT roundtrip even for a single implementation or serializer. in fact, the IEEE 754 standard leaves open big areas undefined. same as with C.


So when we prefix a NUL character to Base64-encoded binary data, the result is still a valid UTF-8 string.

I am almost sure the following is true:

NUL is a valid unicode character, which "ends" a string, but which may appear zero, one or multiple times in unicode strings, and unicode implementations will only "consume" the string up to the first occurrence of NUL

but "consume" .. is of course where the "details" begin;)

eg consider this:

(cpy311_7) oberstet@intel-nuci7:~/scm/crossbario/crossbar$ python3
Python 3.11.1 (main, Jan 15 2023, 12:28:34) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> chr(65)
'A'
>>> type(chr(65))
<class 'str'>
>>> chr(0)
'\x00'
>>> type(chr(0))
<class 'str'>
>>> s1 = chr(65) + chr(65) + chr(0)
>>> s2 = chr(65) + chr(0) + chr(65)
>>> type(s1)
<class 'str'>
>>> type(s2)
<class 'str'>
>>> len(s1)
3
>>> len(s2)
3
>>> print(s1)
AA
>>> print(s2)
AA
>>> print(s1 + '-world')
AA-world
>>> print(s2 + '-world')
AA-world
>>> len(s1 + '-world')
9
>>> len(s2 + '-world')
9
>>> s3 = chr(65) + chr(65)
>>> print(s3 + '-world')
AA-world
>>> len(s3 + '-world')
8
>>> 

this is different from how plain C would treat above.

surprised? I knew that .. and I hate it: why can't people write precise and complete specs?

sure, gödel, incompleteness and all that. but above could have been properly defined, similar for IEEE 754.

of course it never will be ... we'll be stuck with "\0" means "something" .. or not. anyways;) it works. good enough.

@ecorm
Copy link

ecorm commented Aug 21, 2023

having said that, here is some original motivation:
...snip...
here is the "means" of achieving that: only use types which are directly present in all serializers, and only types which perfectly round-trip across for a single serializer and between serializers

Your motivation makes perfect sense and I agree with it.

e.g. even though some serializers directly support binary types, not all do of course (well, and our JSON binary encoding is working perfectly, it is not a commonly used standard outside WAMP)

WAMP's JSON binary encoding scheme was a bit problematic for me: The serialization library already had an option to Base64-encode binary data, but I had to hack it so that the NUL character could be read/written before the Base64 portion. Fortunately, the library provided enough hooks so that it didn't require me to modify its source code. But the hack is a bit ugly because I ended up implementing my own context stack in parallel to the one used internally by the serialization library.

here is another one: IEEE 754 floats do NOT roundtrip even for a single implementation or serializer. in fact, the IEEE 754 standard leaves open big areas undefined. same as with C.

For JSON, IEEE 754 floats can roundtrip if you use a library that takes care to output sufficient decimal digits (and handle inf and NaN). Of course, that only works if you're in control of all peers.

I think what you meant is that you cannot rely on IEEE 754 float serialization/deserialzation to be lossless in general, when multiple implementations are involved.

NUL is a valid unicode character, which "ends" a string, but which may appear zero, one or multiple times in unicode strings, and unicode implementations will only "consume" the string up to the first occurrence of NUL

As far as I can tell, the Unicode standard itself does not mandate how the NUL character is supposed to be used, other than it being a "string terminator". From p. 916 of the specification:

U+0000 null may be used as a Unicode string terminator, as in the C language. Such usage
is outside the scope of the Unicode Standard, which does not require any particular formal
language representation of a string or any particular usage of null.

@oberstet
Copy link
Contributor

thanks again for your thoughts! I fully agree rgd binary and unicode. and thanks for digging out the unicode spec text. interesting!


rgd IEEE 754 (fav topic of mine;), things are more complicated I'm afraid, or let's say, I failed to communicate what I really meant. let me try again (sorry, character defect ... I have to;)

For JSON, IEEE 754 floats can roundtrip if you use a library that takes care to output sufficient decimal digits (and handle inf and NaN). Of course, that only works if you're in control of all peers.
I think what you meant is that you cannot rely on IEEE 754 float serialization/deserialzation to be lossless in general, when multiple implementations are involved.

sure, between FLOAT_MIN and FLOAT_MAX, there are countable infinite many rational numbers (and uncountable infinite many real, but those are outside finite computers anyways), BUT only a finite subset of those numbers have a binary-float representation with the finite bits of exponent/mantissa in ANY case - this is the expected part.

luckily, computing devices are also (usually) using binary floats, so there isn't a loss at that leg.

BUT: NaN and inf are implementation defined in most of their bits.

this was done intentionally - the designers intended to allow implementation to use the additional / undefined / implementation defined bits to signal the source or reason of NaN or inf.

fwiw, here is how chatgpt 3.5 fails to get it, but then admits it is the case:

Q:

grafik

Correction:

grafik

@ecorm
Copy link

ecorm commented Aug 21, 2023

ChatGPT's correction seems like it was written by a human, which I guess it means that it passed the Turing Test in that particular case. 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants