Why is `Decoder::decode` returning a `Cow`? #512

ok-nick · 2022-11-23T07:08:15Z

According to:

Lines 81 to 90 in 92de42e

    
               pub fn decode<'b>(&self, bytes: &'b [u8]) -> Result<Cow<'b, str>> { 
        
                   #[cfg(not(feature = "encoding"))] 
        
                   let decoded = Ok(Cow::Borrowed(std::str::from_utf8(bytes)?)); 
        
                   #[cfg(feature = "encoding")] 
        
                   let decoded = decode(bytes, self.encoding); 
        
                   decoded 
        
               } 
        
           }

Decoder::decode returns an owned Cow when feature encoding is enabled, otherwise a borrowed one. How come there aren't two definitions for decode, one defined for feature encoding that returns an owned String and one defined without the feature that returns a borrowed str?

I ran across this issue trying to return a borrowed string (or bytes) from a function, but unfortunately the Cow is owned, so I couldn't. My only solution would be to return the owned Cow, then further pattern match on that to get the borrowed value out. If I wanted to convert the Cow to bytes, it would be even more of a hassle.

On a side note, would it be useful to have a function to return the raw bytes rather than checking its valid UTF-8? I took a quick peek around the function and it seems quick-xml has an additional abstraction that pipes encodings to encoding_rs. Maybe the user should handle it themselves for more flexibility? Or perhaps quick-xml should be split into a lower-level crate that handles the bare bones of XML parsing, then have an abstraction on top for this kind of flexibility?

The text was updated successfully, but these errors were encountered:

Mingun · 2022-11-23T10:19:08Z

How come there aren't two definitions for decode, one defined for feature encoding that returns an owned String and one defined without the feature that returns a borrowed str?

That is how it was implemented before and we specially change it in #180 because API that changes depending on the feature flags is hard to use -- features are additive and different dependencies can enable different set of features.

I ran across this issue trying to return a borrowed string (or bytes) from a function, but unfortunately the Cow is owned, so I couldn't. My only solution would be to return the owned Cow, then further pattern match on that to get the borrowed value out. If I wanted to convert the Cow to bytes, it would be even more of a hassle.

If you get an owned Cow that means that some decoding was performed which requires allocation. I cannot understand how you expect to get a borrowed string here. That is only possible if memory would be owned by decode function (which is impossible) or it will leak.

On a side note, would it be useful to have a function to return the raw bytes rather than checking its valid UTF-8?

That is how it worked internally, decoding is done only when you get content of events, which require you to manually call corresponding function. So if you don't need content you'll not pay for decoding (but this is dangerous because you can pass invalid documents). I cannot imagine a situation where you'll need raw bytes of XML stream.

Or perhaps quick-xml should be split into a lower-level crate that handles the bare bones of XML parsing, then have an abstraction on top for this kind of flexibility?

Parsing of XML should be done after decoding. XML is defined in terms of characters, not bytes. Right now we do the opposite and this is the case why UTF-16 and ISO-2022-JP are not supported. For the other supported encodings the are no difference in practice, because them are ASCII-compatible and all XML markup characters are ASCII.

ok-nick · 2022-11-23T18:42:37Z

I was responding when I realized read_to_text internally calls read_to_end and then decode. The bytes are technically exposed through this function so I can avoid the Cow.

If you get an owned Cow that means that some decoding was performed which requires allocation. I cannot understand how you expect to get a borrowed string here. That is only possible if memory would be owned by decode function (which is impossible) or it will leak.

Sorry, I should've been a little more clear on this. I had an owned Cow that contains a borrowed value, so Cow::Borrowed(...).

Mingun added question encoding Issues related to support of various encodings of the XML documents labels Nov 23, 2022

ok-nick closed this as completed Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is `Decoder::decode` returning a `Cow`? #512

Why is `Decoder::decode` returning a `Cow`? #512

ok-nick commented Nov 23, 2022

Mingun commented Nov 23, 2022

ok-nick commented Nov 23, 2022

Why is Decoder::decode returning a Cow? #512

Why is Decoder::decode returning a Cow? #512

Comments

ok-nick commented Nov 23, 2022

Mingun commented Nov 23, 2022

ok-nick commented Nov 23, 2022

Why is `Decoder::decode` returning a `Cow`? #512

Why is `Decoder::decode` returning a `Cow`? #512