JVM integration with InputStream and OutputStream #1569

sandwwraith · 2021-06-24T14:33:04Z

No description provided.

formats/json/jvmMain/src/kotlinx/serialization/json/JvmStreams.kt

qwwdfsad

I haven't got a deep look into decoding yet, but it seems that there is already enough actionable points here

formats/json/commonTest/src/kotlinx/serialization/json/BasicTypesSerializationTest.kt

formats/json/commonTest/src/kotlinx/serialization/json/JsonMapKeysTest.kt

formats/json/commonTest/src/kotlinx/serialization/json/JsonTestBase.kt

formats/json/jvmMain/src/kotlinx/serialization/json/JvmStreams.kt

benchmark/src/jmh/kotlin/kotlinx/benchmarks/json/TwitterFeedStreamBenchmark.kt

formats/json/jvmMain/src/kotlinx/serialization/json/internal/JsonStringBuilder.kt

qwwdfsad · 2021-06-28T16:52:14Z

formats/json/jvmMain/src/kotlinx/serialization/json/internal/JsonToWriterStringBuilder.kt

+
+internal class JsonToWriterStringBuilder(private val writer: Writer) : JsonStringBuilder(
+    // maybe this can also be taken from the pool, but currently initial char array size there is 128, which is too low.
+    CharArray(BATCH_SIZE)


It seems to be pretty huge allocation (e.g. it always misses TLAB). Could you please ensure it doesn't dominate small objects serialization?

If so, it's worth either reducing its size or pool a few instances of this

chris-hatton · 2021-07-01T02:16:28Z

Question regarding the behaviour of this change:
Will decodeFromStream behave so that I can call it serially (repeatedly, non-overlapping) against an open stream, each time consuming only as much of the stream as is necessary to form a complete object?

For example; given this stream carrying two distinct JSON objects:

{"someKey":"someValue1"}{"someKey":"someValue2"}

Could I call decodeFromStream() twice, to get both objects?
This is the characteristic I am looking for, to be able to read a Flow<T> from a long-lived HTTP response stream.
Thanks for your efforts @sandwwraith 🙏 This is a hotly awaited improvement.

BenWoodworth · 2021-07-01T16:38:12Z

I'm curious, was using Reader/Writers considered (instead of Streams)? That would've been my first thought for JSON/String formats, and avoids dealing with character encodings.

sandwwraith · 2021-07-06T12:54:12Z

@BenWoodworth Reader/Writer is used internally. API provides methods with Input/OutputStreams because it's more versatile and allows to implement charset-specific parsers in the future

@chris-hatton Yes, I think we can do it — I'll add a test for it

qwwdfsad

I've dug through the profile and it seems that JVM cannot optimize string access across CharSequence interface properly, especially in small hot methods. definitelyNotEof is also a heavy-hitter for such functions.

I've tried to tweak it here and there, but it's quite hard to ensure all the invariants with the existing limitations.

I'd suggest you do the following:

Get the base JsonLexer with the only state -- currentPosition and utility functions for slow-paths: skipElement, various fail functions, maybe boolean/numbers consumption (char sequence is an input parameter of a function then). Everything else is copy-pasted between streaming and string implementations.

At this moment, the performance model is quite clear and expected degradation should be insignificant (educated guess -- 2-4%).
Then you can start commonizing (handling via CharSequence interface in the base json lexer) the parts of parsing where the compiler is smart enough to optimize everything away.

I expect that the biggest offenders (things you cannot commonize) will be just a few functions that were written in a compact and polished manner -- skipWhitespaces, tryConsumeComma, canConsumeValue and peekNextToken. Everything else will probably be working well via CharSequence and the amount of duplicated code we have to maintain will be quite isolated

qwwdfsad

Good to go 🚀

Please don't forget to file issues for future improvements -- UTF-8 parsing and multishot streams

qwwdfsad · 2021-09-03T11:50:41Z

formats/json/jvmMain/src/kotlinx/serialization/json/JvmStreams.kt

+import java.io.*
+
+/**
+ * Serializes the [value] with [serializer] into a [stream] using JSON format and UTF-8 encoding..


Redundant dot

qwwdfsad · 2021-09-03T11:52:45Z

formats/json/jvmMain/src/kotlinx/serialization/json/internal/JsonToWriterStringBuilder.kt

+        return oldSize
+    }
+
+    private fun dumpAndReset(sz: Int = size) {


[nit] it's just flush :)

* Performance-friendly JsonLexer

sandwwraith · 2021-09-06T11:50:37Z

#1662

slavonnet · 2021-09-09T18:05:07Z

String. Serializer work ok on big JSon. Stream get random.eof exception I sinppe replace Strung Encoder t to Stream Encoder in converter factory . Small size json is ok. Buffer rewrite by gzip wtite all

sandwwraith requested review from qwwdfsad and shanshin June 24, 2021 14:33

sandwwraith mentioned this pull request Jun 28, 2021

Support I/O stream #204

Closed

sandwwraith commented Jun 28, 2021

View reviewed changes

formats/json/jvmMain/src/kotlinx/serialization/json/JvmStreams.kt Show resolved Hide resolved

qwwdfsad suggested changes Jun 28, 2021

View reviewed changes

sandwwraith force-pushed the jvm-streams-integration branch from 8678e12 to 7807f6d Compare July 19, 2021 16:48

qwwdfsad self-requested a review July 29, 2021 16:57

qwwdfsad suggested changes Jul 29, 2021

View reviewed changes

sandwwraith force-pushed the jvm-streams-integration branch from 365ac9c to cea326e Compare August 23, 2021 15:48

sandwwraith requested a review from qwwdfsad August 24, 2021 14:44

qwwdfsad approved these changes Sep 3, 2021

View reviewed changes

sandwwraith and others added 7 commits September 3, 2021 16:33

Working draft

4183b8a

Extract functions and add correct naming

1405635

Change from file benchmarks to BAOS/IS benchmarks

036936d

Performance-friendly JsonLexer (#1635)

bb0dc8f

* Performance-friendly JsonLexer

Removed charset from public api

7dc9272

Force EOF after reading an object from stream

9362f5f

~review fixes

bd8c491

sandwwraith force-pushed the jvm-streams-integration branch from ad9f0e5 to bd8c491 Compare September 3, 2021 13:33

sandwwraith merged commit c0c60a6 into dev Sep 3, 2021

sandwwraith deleted the jvm-streams-integration branch September 6, 2021 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JVM integration with InputStream and OutputStream #1569

JVM integration with InputStream and OutputStream #1569

sandwwraith commented Jun 24, 2021

qwwdfsad left a comment

qwwdfsad Jun 28, 2021

chris-hatton commented Jul 1, 2021 •

edited

BenWoodworth commented Jul 1, 2021

sandwwraith commented Jul 6, 2021

qwwdfsad left a comment •

edited

qwwdfsad left a comment

qwwdfsad Sep 3, 2021

qwwdfsad Sep 3, 2021

sandwwraith commented Sep 6, 2021

slavonnet commented Sep 9, 2021

JVM integration with InputStream and OutputStream #1569

JVM integration with InputStream and OutputStream #1569

Conversation

sandwwraith commented Jun 24, 2021

qwwdfsad left a comment

Choose a reason for hiding this comment

qwwdfsad Jun 28, 2021

Choose a reason for hiding this comment

chris-hatton commented Jul 1, 2021 • edited

BenWoodworth commented Jul 1, 2021

sandwwraith commented Jul 6, 2021

qwwdfsad left a comment • edited

Choose a reason for hiding this comment

qwwdfsad left a comment

Choose a reason for hiding this comment

qwwdfsad Sep 3, 2021

Choose a reason for hiding this comment

qwwdfsad Sep 3, 2021

Choose a reason for hiding this comment

sandwwraith commented Sep 6, 2021

slavonnet commented Sep 9, 2021

chris-hatton commented Jul 1, 2021 •

edited

qwwdfsad left a comment •

edited