Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBOR Feature Drop: COSE #2412

Open
wants to merge 43 commits into
base: dev
Choose a base branch
from

Conversation

JesusMcCloud
Copy link

This PR obsoletes #2371 and #2359 as it contains the features of both PRs and many more.

Specifically, this PR contains all feature required to serialize and parse COSE-compliant CBOR (thanks to @nodh). While some canonicalization steps (such as sorting keys) still needs to be performed manually. It does get the job done quite well. Namely, we have successfully used the features introduced here to create and validate ISO/IEC 18013-5:2021-compliant mobile driving license data.

This PR introduces the following features to the CBOR format:

  • Serial Labels
  • Tagging of keys and values
  • Definite length encoding (this is the largest change, as it effectively makes the cbor encoder two-pass)
  • Globally preferring major type 2 for byte array encoding
  • Encodes null complex objects as empty map (to be COSE-compliant)

@JesusMcCloud JesusMcCloud changed the base branch from master to dev August 18, 2023 11:18
@sandwwraith
Copy link
Member

Hi, thanks for your PR! Just as a quick heads-up so you know I've seen it — I'm going on vacation soon, so I'll be able to properly review it in September.

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I appreciate the amount of work you've done — especially for the tests. I didn't look into the implementation (yet), focusing on surface API for now.

Regarding failure on CI — it's connected to the fact that Long.toUnsignedString was added only in Android API 26. However, it is possible to use this declaration on Android with desugaring enabled, so there shouldn't be any problems. You have to crate a special annotation @SuppressAnimalSniffer (it already exists for core and json, but not for cbor, see e.g. here:

@SuppressAnimalSniffer // Long.toUnsignedString(long)
) and use it in configuration here:
switch (name) {

@SerialInfo
@Target(AnnotationTarget.PROPERTY)
@ExperimentalSerializationApi
public annotation class ValueTags(@OptIn(ExperimentalUnsignedTypes::class) vararg val tags: ULong) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it is recommended to propagate opt-in in the libraries (see here: https://kotlinlang.org/docs/opt-in-requirements.html#propagating-opt-in and here: https://kotlinlang.org/docs/jvm-api-guidelines-backward-compatibility.html#the-requiresoptin-annotation). The problem here is, while ULong is already stable, ULongArray which is automatically inserted by vararg is not.

I'm also not sure if it is very common for a property to have multiple tags. Have you considered using @Repeatable annotation instead of vararg (it works only with Kotlin 1.9.0 though, because of the #2099) or not providing such functionality at all?
I don't mind propagating ExperimentalUnsignedTypes, however, if this is necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! To be frank, I did not even think about@Repeatable, because vararg seemed intuitive to use with minimum boilerplate when defining serializable data types using multiple tags, as the declaration directly reflects what's happening in the resulting CBOR byte string.

We discussed this internally and neither of us has a string opinion on either way of providing this feature. So if @Repeatable is preferred, we'll, of course, go with that!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try it out? If it works for you, I think it may be better that way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After starting to refactor KeyTags and ValueTags to @Repeatable, I notices that this would make KeyTags and ValueTags behave differently from the @CborArray annotation, which requires the vararg syntax to make sense. IMO this change would therefore only make tagging inconsistent, resulting in API inconsistencies and more diverse (and hence a little more complex) code internally.

Therefore, I am now against this change. Of course, if you still prefer it we'll comply.

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the test thoroughly (to understand if there is a need for more test cases), but I highlighted all of the problematic moments from my perspective. Don't forget to press 'show hidden conversations' on Github :)

* Note that `equals()` and `hashCode()` only use `value`, not `serialized`.
*/
@Serializable(with = ByteStringWrapperSerializer::class)
public class ByteStringWrapper<T>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: it seems you need to update apiDump since this class is no longer data

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, done!

@Serializable(with = ByteStringWrapperSerializer::class)
public class ByteStringWrapper<T>(
public val value: T,
public val serialized: ByteArray = byteArrayOf()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've skimmed through the diff and can't find any usages of this value. Is there any reason to be a val? In what cases will the user be interested in accessing it? Also, it is very easy to create inconsistent data with this approach (e.g. ByteStringWrapper(original.value, byteArrayOf(garbage))). If it is not necessary to have it, this class can be a value class. Or even no class would be needed, as this can be handled with yet another @SerialInfo annotation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nodh can you take care of this, please?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll use the CBOR serialization for ISO 18013-5 compliaent mobile driving licences. There verifiers need to check that (the digest of) the serialized byte values appear in another structure (signed by the issuer). So in that case its quite useful to have the bytes as they were serialized (and parsed) in the deserialized ByteStringWrapper object too.

internal val writeValueTags: Boolean,
internal val verifyKeyTags: Boolean,
internal val verifyValueTags: Boolean,
internal val writeDefiniteLengths: Boolean,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not documented as @param

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, done!

@SerialInfo
@Target(AnnotationTarget.CLASS)
@ExperimentalSerializationApi
public annotation class CborArray(@OptIn(ExperimentalUnsignedTypes::class) vararg val tag: ULong)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vararg val tag is still undocumented though

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry! done!

readProperties++
descriptor.getElementIndexOrThrow(elemName)
descriptor.getElementIndexOrThrow(elemName).also { index ->
if (cbor.verifyKeyTags) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of duplicated code can be extracted to function (perhaps local)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} else {
if (cbor.verifyKeyTags) {
descriptor.getKeyTags(index)?.let { keyTags ->
if (!(keyTags contentEquals tags)) throw CborDecodingException("CBOR tags $tags do not match declared tags $keyTags")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General observation (for all throw expressions): It is would be much easier to debug errors if they also included locations. Here we don't have a json path analog, but appending descriptor.toString() will already make the job easier significantly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done where we have access to the descriptor (which we do not always have)

readByte()
}

return (if (collectedTags.isEmpty()) null else collectedTags.toULongArray()).also { collected ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO also creates unnecessary nesting here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to be transformed into an ULongArray for comparison and for returning it. the alternative of adding a temp variable just for two accesses seems a bit cumbersome to me


return (if (collectedTags.isEmpty()) null else collectedTags.toULongArray()).also { collected ->
tags?.let {
if (!it.contentEquals(collected)) throw CborDecodingException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that contentEquals has both nullable receiver and parameter. It results in behavior like this: intArrayOf(1,2,3).contentEquals(null) -> false. I'm not sure it is what you intended to have here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to compare if tags are actually set. Otherwise, we don't care.

I'll also add that comment to the code.

@@ -633,9 +952,55 @@ private fun Iterable<ByteArray>.flatten(): ByteArray {

@OptIn(ExperimentalSerializationApi::class)
private fun SerialDescriptor.isByteString(index: Int): Boolean {
return getElementAnnotations(index).find { it is ByteString } != null
return kotlin.runCatching { getElementAnnotations(index).find { it is ByteString } != null }.getOrDefault(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why runCatching is needed here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only a theoretical possibility, but an IndexOutOfBoundsException could be thrown. I'll remove it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, such a call fails for the WASM target, even though we use runCatching. I have no explanation

@develar
Copy link

develar commented Dec 17, 2023

Globally preferring major type 2 for byte array encoding

It would be awesome, as currently it's somewhat surprising that the default encoded size of byte arrays is nearly twice as large.

@JesusMcCloud
Copy link
Author

WASM still fails, because even runCatching does not catch a runtimeException for an expected IndexOutOfBoundsException when checking for tags. Help would be much appreachiated.

@JesusMcCloud
Copy link
Author

JesusMcCloud commented Jan 12, 2024

WASM still fails, because even runCatching does not catch a runtimeException for an expected IndexOutOfBoundsException when checking for tags. Help would be much appreachiated.

Here are the details:

  Current test: testOpenPolymorphism
  Process output:
   wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:1
  
  
  RuntimeError: array element access out of bounds
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlin.Array.get (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[2207]:0x4247e)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.descriptors.SerialDescriptorImpl.getElementAnnotations (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[3974]:0x595a3)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.internal.getKeyTags (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5369]:0x6b6e2)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.internal.Preamble.encode (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5180]:0x68fa4)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.internal.Token.encode (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5173]:0x68d56)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.internal.CborWriter.encode (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5294]:0x6a0c3)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.Cbor.encodeToByteArray (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5042]:0x6743b)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.CborPolymorphismTest.testOpenPolymorphism (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[6111]:0x79d2a)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlin.wasm.internal.testContainer$kotlinx.serialization.cbor test fun$CborPolymorphismTest test fun$testOpenPolymorphism test fun.invoke (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[6715]:0x8440e)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlin.test.TeamcityAdapter$test$lambda.invoke (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5605]:0x6f4a9)

EDIT: seems it's already been reported and there's really not much we can do: https://youtrack.jetbrains.com/issue/KT-59081/WASM-Cant-catch-index-out-of-bounds-and-divide-by-0-exceptions

@sandwwraith sandwwraith self-assigned this Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants