Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayIndexOutOfBoundsException when using Json.decodeFromStream #1709

Closed
bishiboosh opened this issue Sep 30, 2021 · 9 comments
Closed

ArrayIndexOutOfBoundsException when using Json.decodeFromStream #1709

bishiboosh opened this issue Sep 30, 2021 · 9 comments
Assignees
Labels

Comments

@bishiboosh
Copy link

When using Json.decodeFromStream, the following exception occurs:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 16384 out of bounds for length 16384
	at kotlinx.serialization.json.internal.ArrayAsSequence.get(JsonLexerJvm.kt:26)
	at kotlinx.serialization.json.internal.ArrayAsSequence.charAt(JsonLexerJvm.kt:23)
	at kotlinx.serialization.json.internal.AbstractJsonLexer.consumeString(AbstractJsonLexer.kt:331)
	at kotlinx.serialization.json.internal.ReaderJsonLexer.consumeKeyString(JsonLexerJvm.kt:149)
	at kotlinx.serialization.json.internal.AbstractJsonLexer.consumeString(AbstractJsonLexer.kt:308)
	at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeString(StreamingJsonDecoder.kt:243)
	at kotlinx.serialization.internal.StringSerializer.deserialize(Primitives.kt:142)
	at kotlinx.serialization.internal.StringSerializer.deserialize(Primitives.kt:138)
	at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
	at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:35)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableValue(AbstractDecoder.kt:43)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeNullableSerializableElement(AbstractDecoder.kt:79)
	at eu.sweetlygeek.serialization.error.Video$$serializer.deserialize(model.kt:28)
	at eu.sweetlygeek.serialization.error.Video$$serializer.deserialize(model.kt:28)
	at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
	at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:35)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableValue(AbstractDecoder.kt:43)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeNullableSerializableElement(AbstractDecoder.kt:79)
	at eu.sweetlygeek.serialization.error.Article$$serializer.deserialize(model.kt:7)
	at eu.sweetlygeek.serialization.error.Article$$serializer.deserialize(model.kt:7)
	at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
	at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:35)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableValue(AbstractDecoder.kt:43)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableElement(AbstractDecoder.kt:70)
	at kotlinx.serialization.encoding.CompositeDecoder$DefaultImpls.decodeSerializableElement$default(Decoding.kt:535)
	at kotlinx.serialization.internal.ListLikeSerializer.readElement(CollectionSerializers.kt:80)
	at kotlinx.serialization.internal.AbstractCollectionSerializer.readElement$default(CollectionSerializers.kt:51)
	at kotlinx.serialization.internal.AbstractCollectionSerializer.merge(CollectionSerializers.kt:36)
	at kotlinx.serialization.internal.AbstractCollectionSerializer.deserialize(CollectionSerializers.kt:43)
	at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
	at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:35)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableValue(AbstractDecoder.kt:43)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableElement(AbstractDecoder.kt:70)
	at eu.sweetlygeek.serialization.error.ResultPage$$serializer.deserialize(model.kt:48)
	at eu.sweetlygeek.serialization.error.ResultPage$$serializer.deserialize(model.kt:48)
	at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
	at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:35)
	at kotlinx.serialization.json.JvmStreamsKt.decodeFromStream(JvmStreams.kt:65)
	at eu.sweetlygeek.serialization.error.MainKt.main(main.kt:11)
	at eu.sweetlygeek.serialization.error.MainKt.main(main.kt)

Process finished with exit code 1

This a bug I've caught while using randomly-generated data in unit tests, so the data which needs to be used to reproduce is unfortunately a bit complex.

From what I've been able to gather (but my intuition may be false), this happens in the following code block in AbstractJsonLexer.kt:

while (char != STRING) {
            if (char == STRING_ESC) {
                usedAppend = true
                currentPosition = appendEscape(lastPosition, currentPosition)
                lastPosition = currentPosition
            } else if (++currentPosition >= source.length) {
                usedAppend = true
                // end of chunk
                appendRange(lastPosition, currentPosition)
                currentPosition = definitelyNotEof(currentPosition)
                if (currentPosition == -1)
                    fail("EOF", currentPosition)
                lastPosition = currentPosition
            }
            char = source[currentPosition]
        }

Seems like it happens when char is STRING_ESC (so we're waiting to see what have been escaped), but at the same time we're at the end of the chunk and nothing is loaded because we enter into the first branch of the if.

To Reproduce
I've made a sample project with the incriminated data at https://github.com/bishiboosh/kotlin-serialization-error, the error can be replicated by launching the main method in main.kt

Expected behavior
When using Json.decodeFromString, no error occurs, so decodeFromStream should not have issue with this JSON.

Environment

  • Kotlin version: 1.5.31
  • Library version: 1.3.0
  • Kotlin platforms: JVM
  • Gradle version: 7.2
@bishiboosh
Copy link
Author

(again, sorry for the quite long JSON reproduction, but my knowledge of JSON and serialization wasn't quite good enough to determine how to provide a simpler reproduction)

@rgmz
Copy link

rgmz commented Oct 26, 2021

I'm also experiencing this issue — same versions as @bishiboosh.

I can't upload a full reproducer to GitHub at this time, but a basic outline is that the issue occurs when I attempt to parse this file: https://github.com/spdx/license-list-data/blob/master/json/details/GPL-1.0.json.

Perhaps the length, or presence of escapes and unicode characters in a few of the fields is what's causing the issue? @bishiboosh encountered this with randomly-generated data, so "long, complex strings" could be a commonality.

while (char != STRING) {
if (char == STRING_ESC) {
usedAppend = true
currentPosition = appendEscape(lastPosition, currentPosition)
lastPosition = currentPosition
} else if (++currentPosition >= source.length) {
usedAppend = true
// end of chunk
appendRange(lastPosition, currentPosition)
currentPosition = definitelyNotEof(currentPosition)
if (currentPosition == -1)
fail("EOF", currentPosition)
lastPosition = currentPosition
}
char = source[currentPosition]
}

@rgmz
Copy link

rgmz commented Oct 27, 2021

I added some debugging statements to the portion of AbstractJsonLexer linked above. Here's the output:

... (many, many lines)
# last successful parse
(line 318) Start of while loop [char=\, currentPosition=16376, lastPosition=16367]
(line 318) if(char == STRING_ESC) [char=\, STRING_ESC=\]
(line 324) end if [usedAppend=true, currentPosition=16382, lastPosition=16382]
(line 336) end while [char=\, currentPosition=16382, source.length=16384]
# parse that fails
(line 318) Start of while loop [char=\, currentPosition=16382, lastPosition=16382]
(line 318) if(char == STRING_ESC) [char=\, STRING_ESC=\]
(line 324) end if [usedAppend=true, currentPosition=16384, lastPosition=16384]
(line 336) end while [char=\, currentPosition=16384, source.length=16384]

java.lang.ArrayIndexOutOfBoundsException: Index 16384 out of bounds for length 16384
	at kotlinx.serialization.json.internal.ArrayAsSequence.get(JsonLexerJvm.kt:26) ~[kotlinx-serialization-json-jvm-5.1.0-SNAPSHOT.jar:na]
	at kotlinx.serialization.json.internal.ArrayAsSequence.charAt(JsonLexerJvm.kt:23) ~[kotlinx-serialization-json-jvm-5.1.0-SNAPSHOT.jar:na]

@rgmz
Copy link

rgmz commented Oct 27, 2021

Okay, here's the specific text from GPL-1.0.json that causes this. The exception occurs when attempting to parse this part of licenseTextHtml field:

\    \div class\\optional-license-text\\\      \p\\        GNU GENERAL PUBLIC LICENSE\br /\\\        Version 1, February 1989\      \/p\\\    \/div\\    \p\\      Copyright (C) 1989 Free Software Foundation, Inc. 51\      Franklin St, Fifth Floor, Boston, MA 02110-1301 USA\    \/p\\\    \p\\      Everyone is permitted to copy and distribute verbatim copies\      of this license document, but changing it is not allowed.\    \/p\\\    \p\\      Preamble\    \/p\\\    \p\\      The license agreements of most software companies try to keep\      users at the mercy of those companies. By contrast, our General\      Public License is intended to guarantee your freedom to share\      and change free software--to make sure the software is free for\      all its users. The General Public License applies to the Free\      Software Foundation\apos;s software and to any other program whose\      authors commit to using it. You can use it for your programs, too.\    \/p\\\    \p\\      When we speak of free software, we are referring to freedom, not\      price. Specifically, the General Public License is designed to\      make sure that you have the freedom to give away or sell copies\      of free software, that you receive source code or can get it if\      you want it, that you can change the software or use pieces of it\      in new free programs; and that you know you can do these things.\    \/p\\\    \p\\      To protect your rights, we need to make restrictions that forbid\      anyone to deny you these rights or to ask you to surrender the\      rights. These restrictions translate to certain responsibilities for\      you if you distribute copies of the software, or if you modify it.\    \/p\\\    \p\\      For example, if you distribute copies of a such a program, whether\      gratis or for a fee, you must give the recipients all the rights\      that you have. You must make sure that they, too, receive or\      can get the source code. And you must tell them their rights.\    \/p\\\    \p\\      We protect your rights with two steps: (1) copyright the\      software, and (2) offer you this license which gives you legal\      permission to copy, distribute and/or modify the software.\    \/p\\\    \p\\      Also, for each author\apos;s protection and ours, we want to make\      certain that everyone understands that there is no warranty for\      this free software. If the software is modified by someone else\      and passed on, we want its recipients to know that what they\      have is not the original, so that any problems introduced by\      others will not reflect on the original authors\apos; reputations.\    \/p\\\    \p\\      The precise terms and conditions for copying,\      distribution and modification follow.\    \/p\\\    \p\\      GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS\      FOR COPYING, DISTRIBUTION AND MODIFICATION\    \/p\\\\ul style\\list-style:none\\\\li\\        \var class\\replacable-license-text\\0.\/var\\        This License Agreement applies to any program or other work which\        contains a notice placed by the copyright holder saying it may\        be distributed under the terms of this General Public License.\        The \quot;Program\quot;, below, refers to any such program or work, and\        a \quot;work based on the Program\quot; means either the Program or any\        work containing the Program or a portion of it, either verbatim\        or with modifications. Each licensee is addressed as \quot;you\quot;.\      \/li\\\li\\        \var class\\

The next part of the line, where it's failing, is (after <var class=\):

<var class=\"replacable-license-text\">1.<
# actual source
\u003cvar class\u003d\"replacable-license-text\"\u003e1.\u003c/var\u003e

Still not clear why this is failing. It appears to just be the escape before the double-quote, which is present in many other places. I notice that the serializer is adding extra escapes (\), so perhaps those additional characters are messing up the character count?

@rgmz
Copy link

rgmz commented Oct 27, 2021

Here's a strange issue that I encountered while debugging this.

Given the following code, where /tmp/GPL-1.0.json is this file, I get the aforementioned error.

    val json = Json {
        ignoreUnknownKeys = true
    }
    @Serializable
    data class Detail(
        val licenseId: String,
        val name: String,
        val licenseText: String,
        val licenseTextHtml: String,
        val standardLicenseHeader: String?,
        val standardLicenseHeaderHtml: String,
        val licenseComments: String?,
        val crossRef: List<LicenseCrossRef>,
        val seeAlso: List<String>
    )

   fun main() {
        val detail: Detail = Path.of("/tmp/GPL-1.0.json").inputStream().use {
            json.decodeFromStream(it)
        }
   }

However, if I remove any fields from the GPL-1.0.json file (e.g. delete isDeprecatedLicenseId) there's a different, normal, error.

Caused by: kotlinx.serialization.MissingFieldException: Field 'licenseComments' is required for type with serial name 'com.example.Detail', but it was missing
	at kotlinx.serialization.internal.PluginExceptionsKt.throwMissingFieldException(PluginExceptions.kt:20) ~[kotlinx-serialization-core-jvm-5.1.0-SNAPSHOT.jar:5.1.0-SNAPSHOT]

edit: I just noticed that both my and bishiboosh's issue happens at 16384. That's especially curious. I don't currently understand the significance of 16384 in this context, but given that 16kib = 16,384 bytes I'm guessing it could be a default size limit for the inputstream.

e.g. https://www.baeldung.com/java-buffered-reader, https://stackoverflow.com/q/37680780

@sandwwraith
Copy link
Member

I think #1706 should fix this. You can check out the branch and test it, or I'll check with the test case you provided a bit later

@sandwwraith sandwwraith self-assigned this Oct 27, 2021
@bishiboosh
Copy link
Author

@sandwwraith I'm not in any rush so I'll let you add the test case and I'll test once the next release appears

@rgmz
Copy link

rgmz commented Oct 27, 2021

@sandwwraith I built that locally and it works without issue 🎉 :

Decoding GPL-1.0.json from stream...
Successfully decoded license: Detail(licenseId=GPL-1.0, name=GNU General Public License v1.0 only, licenseText=GNU GENERAL PUBLIC LICENSE
Version 1, February 1989

Copyright (C) 1989 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
...

@qwwdfsad
Copy link
Member

Fixed in 1.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants