Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve UnsafeByteStringOperations APIs #259

Open
joffrey-bion opened this issue Feb 13, 2024 · 3 comments
Open

Improve UnsafeByteStringOperations APIs #259

joffrey-bion opened this issue Feb 13, 2024 · 3 comments

Comments

@joffrey-bion
Copy link

joffrey-bion commented Feb 13, 2024

These unsafe operations are really needed for any zero-copy conversion between ByteString and other immutable APIs, or between representations that the developer knows will not be mutated during their lifespan.

The opt-in annotation should be enough to deter developers from using them, so I believe it's worth making their usage slightly more comfortable. I see 2 main issues right now:

  • the object wrapper is not very Kotlinesque, extensions would be better
  • withByteArrayUnsafe doesn't return the lambda's result, so it's not convenient to use for conversions. It requires capturing the variable outside the lambda, and this can only be deemed safe by checking the current implementation, which is really not nice (we're already using unsafe APIs, we don't want to rely on implementation details on top of that).

So I would simply suggest the following:

  • remove the UnsafeByteStringOperations wrapper
  • mark ByteString.Companion.wrap() and ByteString.getBackingArrayReference with the @UnsafeByteStringApi annotation
  • make ByteString.Companion.wrap() and ByteString.getBackingArrayReference public
  • rename ByteString.Companion.wrap(array) to wrapUnsafe, or turn it into an extension ByteArray.asByteStringUnsafe (with the conventional as- prefix for methods that just wrap and keep the original instance without copy). If we go the extension route, we could also consider turning the "safe" ByteString(ByteArray) constructor into the extension ByteArray.toByteString() (see Replace ByteString(ByteArray) constructor with ByteArray.toByteString() #260).

I believe getBackingArrayReference is already a clear indication that it's unsafe because we're accessing the backing array, so with the unsafe annotation I don't think we would need to rename it differently. But we could go as far as naming it unsafeBackingArrayReference().

@fzhinkin
Copy link
Collaborator

Reasoning behind the current API:

Use of the unsafe API has some implications, and unless a developer understands what they are doing, it's better to abstain from using it.
The not-very-Kotlinesque wrapper was deliberately chosen to make its use less comfortable. Those who need it will use it anyway, and for those who don't, it'll be yet another barrier stopping them from using the unsafe API.

Yet another aspect: unsafe API is tightly bound to the ByteString implementation, and when we decide to change something (or to add yet another ByteString implementation, like rope-like byte strings allowing faster concatenation), some client code will be broken. Having a pretty inconvenient API is a way to ensure that not so many folks will occasionally start using the API.

@joffrey-bion
Copy link
Author

joffrey-bion commented Feb 13, 2024

I understand this reasoning, and my point is that the annotation is the Kotlin mechanism to express this. I agree that the clunky API is another barrier, but I disagree that this is the right solution given the extent to which these APIs will be needed (see my other message below).

Also, this is not the only problem here. The absence of return value from withByteArrayUnsafe forces me to use it in a way that is unsafe for another reason than just accessing the backing array: I'm now relying on the fact that the array that is given to the lambda can be safely captured outside for a longer scope than the lambda:

@OptIn(UnsafeByteStringApi::class)
private fun ByteString.toNSData(): NSData {
    lateinit var nsData: NSData
    UnsafeByteStringOperations.withByteArrayUnsafe(this) {
        nsData = it.toNSData()
    }
    return nsData
}

@OptIn(ExperimentalForeignApi::class)
private fun ByteArray.toNSData(): NSData = memScoped {
    NSData.create(bytes = allocArrayOf(this@toNSData), length = this@toNSData.size.convert())
}

This is not necessary, so it would be nice to have a getBackingArrayReference equivalent at least, or a generic return type.

unsafe API is tightly bound to the ByteString implementation, and when we decide to change something (or to add yet another ByteString implementation, like rope-like byte strings allowing faster concatenation), some client code will be broken

That is fair, then maybe we're just missing built-in conversions between ByteString and the standard platform types in the kotlinx-io-bytestring library 😄:
#266
#268
#269

But the library cannot cover all possible external representations.

@joffrey-bion
Copy link
Author

joffrey-bion commented Feb 13, 2024

Use of the unsafe API has some implications, and unless a developer understands what they are doing, it's better to abstain from using it.

Actually I think I would even go as far as saying these unsafe APIs will be needed way more often than this sentence implies. In particular the ByteString.wrap() one, and now I think I would almost argue against an opt-in annotation altogether for ByteString.wrap().

Any API that currently returns a ByteArray for the lack of better immutable bytes abstraction could (should) be wrapped very rightfully with a ByteString-returning equivalent. There is no reason to pay a copy cost from the ByteArray to the ByteString in these cases by using ByteString(ByteArray), so using the ByteString.Companion.wrap() is the correct way to go, albeit clunky.

There are many exampes in the JDK or popular libraries:

  • MessageDigest.digest() from the JDK
  • DigestUtils.sha256(String) from Apache Commons
  • Ktor's web socket Frame.readBytes() returns a ByteArray that will never change (the copy is already done inside this function).
  • The stdlib's String.encodeToByteArray() (you had to use ByteString.wrap() in kotlinx-io's implementation of encodeToByteString by the way). This was covered by kotlinx-io to eliminate this specific case from the user's needs, but it still belongs to the examples list as a general demonstration of why we need this API.

While I do believe that kotlinx-io should deal with all of these API adaptations for the Kotlin stdlib, and that Ktor will probably switch to kotlinx-io and fix the API on their side, I guess we'll agree that it's unreasonable to expect kotlinx-io to cover third-party libraries or the entire JDK API surface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants