Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CDATA handling #374

Closed
wants to merge 6 commits into from
Closed

Fix CDATA handling #374

wants to merge 6 commits into from

Conversation

Mingun
Copy link
Collaborator

@Mingun Mingun commented Mar 21, 2022

Fixes #311, closes #370.

Now CData event has its own dedicated type BytesCData instead of BytesText. New type can be converted to BytesText by calling .escape() method, so migrating should be easy, but it is better to review usages, because usually you want to have unescaped data and CDATA section already provide them.

The API a bit dangerous because it not checked for forbidden sequence ]]> -- you can create a BytesCData containing it. Because this is general problem with other event content types, I leaved it now as is. This API should be fixed in another PR.

This PR changes how raw bytes are deserialized, for example, to a Vec<u8>. Before such field always get a unmodified content from the reader -- it means that you can consider raw bytes types as a way to access to raw reader content in your structs. This is dangerous practice, because other formats do not behave like this. Actually, I would prefer to prohibit (de)serializing (into) Vec<u8> and similar at all. This is conceptually right, but maybe unpractical. So now bytes deserialized similarly to strings -- text content unescaped, CDATA content remained unchanged.

I'm thinking about providing encoding handler to a (de)serializer, that should be used to encode / decode binary data to strings, that will be stored in XML, for example, hex, or base64, and by default (handler == None) emit an error on attempt (de)serializing binary data. But this is for another PR.

Unescaping takes into account only default XML entities, namely &lt;, &gt;, &amp;, &apos;, and &quot;. Adding support for document-defined entities is the purpose for another PR.

Mingun and others added 5 commits February 5, 2022 22:08
Now debug representation is readable
failures:
    de::tests::trivial::struct_::cdata::byte_buf
    de::tests::trivial::struct_::cdata::char_
    de::tests::trivial::struct_::cdata::f32_
    de::tests::trivial::struct_::cdata::f64_
    de::tests::trivial::struct_::cdata::false_
    de::tests::trivial::struct_::cdata::i128_
    de::tests::trivial::struct_::cdata::i16_
    de::tests::trivial::struct_::cdata::i32_
    de::tests::trivial::struct_::cdata::i64_
    de::tests::trivial::struct_::cdata::i8_
    de::tests::trivial::struct_::cdata::isize_
    de::tests::trivial::struct_::cdata::string
    de::tests::trivial::struct_::cdata::true_
    de::tests::trivial::struct_::cdata::u128_
    de::tests::trivial::struct_::cdata::u16_
    de::tests::trivial::struct_::cdata::u32_
    de::tests::trivial::struct_::cdata::u64_
    de::tests::trivial::struct_::cdata::u8_
    de::tests::trivial::struct_::cdata::usize_
    de::tests::trivial::struct_::text::byte_buf
… trivial tests

failures:
    de::tests::trivial::struct_::text::byte_buf
    de::tests::trivial::struct_::cdata::byte_buf
…tead of `BytesText`

This commit revert changes from 85f9f68
@Mingun
Copy link
Collaborator Author

Mingun commented May 3, 2022

Merged in Mingun/fast-xml@9795a68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDATA deserialization
1 participant