Does simdjson parse escaped invalid UTF-8 bytes? #2125
-
I'm currently using Boost.JSON when sending JSON messages. I am thinking about using this library to read JSON, while using Boost.JSON to write it. However, as part of my program, Boost.JSON writes out arbitrary bytes as escaped character sequences. I know that simdjson doesn't accept invalid UTF-8, but will the library correctly parse escaped bytes into native bytes? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 11 replies
-
The simdjson supports the full JSON RFC, including escaped characters inside strings. |
Beta Was this translation helpful? Give feedback.
-
@lemire yep. And even if we didn't validate Unicode escape sequences to ensure they were in the proper range, there are binary sequences that just can't be represented by either valid utf-8 or Unicode escape sequences translated to utf-8. e.g. |
Beta Was this translation helpful? Give feedback.
-
@DUOLabs333 I just realized, you can disable utf-8 validation with a #define, and that might let you do binary data in strings without losing any other functionality. I think we are 8-bit clean without utf-8 validation with the exception of |
Beta Was this translation helpful? Give feedback.
@jkeiserWe validate Unicode characters during decoding because we guarantee an UTF-8 output.
@DUOLabs333 You can put base64 data in a string, it is a safe solution. It will probably result in a smaller payload and faster decoding.