Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ByteString implement Iterable #294

Open
mgroth0 opened this issue Apr 11, 2024 · 5 comments
Open

Make ByteString implement Iterable #294

mgroth0 opened this issue Apr 11, 2024 · 5 comments

Comments

@mgroth0
Copy link

mgroth0 commented Apr 11, 2024

Why does ByteString not implement Iterable? Was it a design decision, or was there just never a reason to?

It seems to me like it would be incredibly useful and 100% safe.

@JakeWharton
Copy link

On the JVM this would incur boxing and a ton of virtual dispatching making the performance pretty awful. What are you trying to do?

@mgroth0
Copy link
Author

mgroth0 commented Apr 13, 2024

Thanks for explaining. If I understand correctly, implementing Iterable would only incur boxing when the iterator method is called.

I don't have one specific use case, but it seems all around practical because we immediately can use entire suites of extensions methods on Iterable such as Kotlin Collections from the standard library. Also, I like to make classes that take Iterable constructor parameters. Then I build a list or set to get a safe immutable copy inside the class.

I think the particular thing I noticed was that there is no existing forEach extension for ByteString. I could quickly write it myself, but it just gave me this idea because implementing Iterable would make it work out of the box in convenient ways.

I also feel convinced because since ByteString is immutable, it is particularly more safe to iterate than bytearray (even concurrently).

On the topic of performance, I try to first write code that is safe, easy to read, and easy to maintain and try not to let performance preocupy me too much since in many real life scenarios these performance differences will be negligable. Also what's the point of computer hardware getting better if we don't let it shave off a few lines of code for us?

If the consensus to implementing Iterable is not to, I would consider an alternative solution maybe to be extension methods to match what we have in the standard library for ByteArray or String.

@JakeWharton
Copy link

If I understand correctly, implementing Iterable would only incur boxing when the iterator method is called.

Every call to Iterator#next() would box, and then immediately unbox because it has to travel through the interface.

Kotlin does feature iterable specializations that avoid the boxing (ByteIterable), but I don't recall the for/in syntax using them. And certainly none of the transforming operators use them as you mention below.

I don't have one specific use case, but it seems all around practical because we immediately can use entire suites of extensions methods on Iterable such as Kotlin Collections from the standard library. Also, I like to make classes that take Iterable constructor parameters. Then I build a list or set to get a safe immutable copy inside the class.

For actual collections of objects this makes sense. Unfortunately for bytes you are much better off treating them as large units. ByteString is already immutable and represents a chunk which as a whole semantically represents something. Additionally, bytes are typically not processed using the operators one would normally use on other iterable things. Sometimes you do, sure, but it's very rare, and in those cases you need to operate directly on the bytes rather than through a (comparatively) heavy abstraction like Iterable.

In the 10 years of Okio we have not needed to iterate a ByteString, at least not in a way that couldn't be done by shallow-copying its backing segments to a Buffer and operating on that directly or its backing ByteArrays via UnsafeCursor.

I think the particular thing I noticed was that there is no existing forEach extension for ByteString. I could quickly write it myself, but it just gave me this idea because implementing Iterable would make it work out of the box in convenient ways.

The need to iterate bytes like this is not entirely uncommon, but if you're operating at the level of bytes you generally want to ingest with as little overhead as possible because you're interpreting those bytes as gzip data, png data, mp3 data, or something else. You aren't looking at each byte one-by-one and doing something with it individually.

On the topic of performance, I try to first write code that is safe, easy to read, and easy to maintain and try not to let performance preocupy me too much since in many real life scenarios these performance differences will be negligible.

This is a good strategy for high-level code which deals with objects and things. Correctness first, performance second.

I'm not sure it makes sense at this layer of abstraction, however. Concrete use-cases of what you are trying to accomplish would go a long way in helping us understand if this is the right strategy or if there's a better one. Speculatively adding the feature, though, is unlikely to yield great results.

Also what's the point of computer hardware getting better if we don't let it shave off a few lines of code for us?

I'm not sure when you started using computers, but computers today are perceptively slower than older ones! Using programs on my 333Mhz PC twenty-five years ago was a much more responsive and snappy experience than my M3 Mac.

The pace at which we've introduced indirection and abstraction at every layer has outpaced Moore's law (which itself has mostly died). The laws of thermodynamics and the speed of light remain fixed, and we're smack up against them now.

The ability to process things efficiently, especially those represented in low-level constructs like a series of bytes, has actually become a competitive advantage for apps.

@qwwdfsad
Copy link
Member

qwwdfsad commented Apr 13, 2024

Nice discussion!

but I don't recall the for/in syntax using them.

They should, but for now it is a hidden knowledge for no reason. Filed https://youtrack.jetbrains.com/issue/KT-67379/

Example
class ByteString : Iterable<Byte> {  
  
    class BIter(private val arr: ByteArray) : ByteIterator() {  
        private var idx = -1  
        override fun nextByte(): Byte = arr[++idx]  
  
        override fun hasNext(): Boolean = idx < arr.size - 1  
    }  
  
    // Experiment: replace ByteIterator with Iterator<Byte> and see the bytecode difference  
    override fun iterator(): ByteIterator = BIter(byteArrayOf(1, 2, 3))  
}  
 
@Test
fun sample() {
    var sum: Int = 0
    for (b in ByteString()) {
        sum += b
    }
}

I'm not sure it makes sense at this layer of abstraction, however. Concrete use-cases of what you are trying to accomplish would go a long way in helping us understand if this is the right strategy or if there's a better one. Speculatively adding the feature, though, is unlikely to yield great results.

This is quite a subtle topic, especially if/when we are going to move ByteString into the standard library and have kotlin.String and kotlin.ByteString along each other, where it is reasonable to expect them to be more or less aligned.

On the one hand, for passer-by usages (and if we are successful, we expect plenty of them!), the impedance mismatch would be quite noticeable -- why for String a user can iterate over characters, while for ByteString they cannot iterate over their bytes/ascii characters? (if there will be such a need at all) It's hard to explain either -- boxing considerations is not a lingua we expect everyone to understand and reason with. Having non-included batteries in the standard library is always questionable.

On the other hand, at the very core, the API is quite specialized, and exposing inherently suboptimal operations is a questionable thing to do. Even if boxing is optimizeable via ByteIterator, the iterator allocation is not (and, more importantly, byte-by-byte processing becomes more straightforward than batch processing that we encourage). This is a perfect use case for https://youtrack.jetbrains.com/issue/KT-33851/iterator-less-for-statement-operator-convention though

@qwwdfsad
Copy link
Member

The ability to process things efficiently, especially those represented in low-level constructs like a series of bytes, has actually become a competitive advantage for apps.

I share this sentiment: for apps, for tools, for IDEs, for everything, to be honest.

This is the topic that puzzles me regularly -- from one perspective, we can only provide quite tailored APIs that are hard to misuse ([lack of] byte iteration is a nice example!); it might help, but comes with its own price. From the other perspective, being efficient cannot really be an afterthought for an app, and no amount of well-thought API can help it if it's not a consideration in the first place. All the convenience extensions will be written anyway if they are not here.

The place on the spectrum from "efficient low-level API" and "convenient yet probably slower API" is really elusive, esp. for IO layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants