Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML deserializate failed when text contains &()_+-=; #719

Open
Xuanwo opened this issue Feb 27, 2024 · 6 comments
Open

XML deserializate failed when text contains &()_+-=; #719

Xuanwo opened this issue Feb 27, 2024 · 6 comments
Labels
question serde Issues related to mapping from Rust types to XML

Comments

@Xuanwo
Copy link

Xuanwo commented Feb 27, 2024

Hi, I'm from Apache OpenDAL Community. I found an interesting cases that quick-xml failed to deserializate content like &()_+-=;.

Context

We are developing WebDAV support with JFrog Artifactory. I'm not sure how JFrog implements WebDAV support, but it seems they do not correctly escape the XML content.

We found that jfrog will return invalid content like

<D:href>/artifactory/example-repo-local/bfc28925-81ec-4b69-b494-e2ee0f957ee8/aba0a24a-3a2b-48e2-ae65-ef2dff027458%20!%40%23%24%25%5E%26()_%2B-%3D%3B'%2C.txt/bfc28925-81ec-4b69-b494-e2ee0f957ee8/aba0a24a-3a2b-48e2-ae65-ef2dff027458%20!@%23$%25%5E&()_+-=;',.txt</D:href>

quick-xml will recongize &()_+-=; as an XML entity thus lead to deserializate failure:

test panicked: stat must succeed: Unexpected (persistent) at stat => deserialize xml

Context:
   service: webdav
   path: 2c9433ab-450c-49e9-bb88-703cb2f327a0 !@#$%^&()_+-=;',.txt

Source:
   Error while escaping character at range 238..244: Unrecognized escape symbol: "()_+-=": Error while escaping character at range 238..244: Unrecognized escape symbol: "()_+-=": Error while escaping character at range 238..244: Unrecognized escape symbol: "()_+-="

To Reproduce

deserializate content like

<?xml version=\"1.0\" encoding=\"utf-8\" ?>
<D:multistatus xmlns:D=\"DAV:\" xmlns:ns0=\"DAV:\">
<D:response xmlns:lp2=\"http://apache.org/dav/props/\" xmlns:lp1=\"DAV:\">
<D:href>/artifactory/example-repo-local/bfc28925-81ec-4b69-b494-e2ee0f957ee8/aba0a24a-3a2b-48e2-ae65-ef2dff027458%20!%40%23%24%25%5E%26()_%2B-%3D%3B'%2C.txt/bfc28925-81ec-4b69-b494-e2ee0f957ee8/aba0a24a-3a2b-48e2-ae65-ef2dff027458%20!@%23$%25%5E&()_+-=;',.txt</D:href>
<D:propstat><D:prop>
<lp1:getcontentlength>2066093</lp1:getcontentlength>
<lp1:getcontenttype>text/plain</lp1:getcontenttype>
<lp1:resourcetype/><lp1:getlastmodified>Tue, 27 Feb 2024 07:13:54 GMT</lp1:getlastmodified>
<lp1:getetag>W/\"2066093-Tue, 27 Feb 2024 07:13:54 GMT\"</lp1:getetag>
<lp1:creationdate>2024-02-27T07:13:54Z</lp1:creationdate>\n<D:displayname><![CDATA[aba0a24a-3a2b-48e2-ae65-ef2dff027458 !@#$%^&()_+-=;',.txt]]></D:displayname>
<D:source></D:source>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>\n</D:response>\n</D:multistatus>

into

#[derive(Deserialize, Debug, PartialEq, Eq, Clone)]
pub struct MultistatusOptional {
    pub response: Option<Vec<ListOpResponse>>,
}

#[derive(Deserialize, Debug, PartialEq, Eq, Clone)]
pub struct ListOpResponse {
    pub href: String,
    pub propstat: Propstat,
}

#[derive(Deserialize, Debug, PartialEq, Eq, Clone)]
pub struct Propstat {
    pub prop: Prop,
    pub status: String,
}

#[derive(Deserialize, Debug, PartialEq, Eq, Clone)]
pub struct Prop {
    #[serde(default)]
    pub displayname: String,
    pub getlastmodified: String,
    pub getetag: Option<String>,
    pub getcontentlength: Option<String>,
    pub getcontenttype: Option<String>,
    pub resourcetype: ResourceTypeContainer,
}

#[derive(Deserialize, Debug, PartialEq, Eq, Clone)]
pub struct ResourceTypeContainer {
    #[serde(rename = "$value")]
    pub value: Option<ResourceType>,
}

#[derive(Deserialize, Debug, PartialEq, Eq, Clone)]
#[serde(rename_all = "lowercase")]
pub enum ResourceType {
    Collection,
}

Related Issues


I understand this seems more like an issue for jfrog to address, but I believe it's also valuable to inform the quick-xml maintainer about this situation.

  • What do you think about this case?
  • Does it make sense to skip invliad/unknown XML entity and keep them as is?
  • Can we skip invliad/unknown XML entity and keep them as is?
@Mingun
Copy link
Collaborator

Mingun commented Feb 27, 2024

In the mentioned case should be possible to keep unrecognized entities unexpanded. We have EntityResolver trait and default NoEntityResolver that does nothing.

However, I assume, that entity &()_+-=; was formed just by luck. If there would no ; character, then you'll get UnterminatedEntity which cannot be avoided. So I cannot say that you problem could be completely workarounded by now.

@Mingun Mingun added question serde Issues related to mapping from Rust types to XML labels Feb 27, 2024
@Xuanwo
Copy link
Author

Xuanwo commented Feb 27, 2024

However, I assume, that entity &()_+-=; was formed just by luck.

Yep. This file path was randomly selected for testing purposes to ensure the storage services can handle special characters correctly.

@Xuanwo
Copy link
Author

Xuanwo commented Feb 27, 2024

Our current workaround involves manually escaping the special string before deserialization. Do you think that's a good idea?

let s = s.replace("&()_+-=;", "%26%28%29_%2B-%3D%3B");

However, I suspect we'll still encounter issues like &acbd; if JFrog doesn't address them. And there's nothing we can do about that, right?

@Mingun
Copy link
Collaborator

Mingun commented Feb 27, 2024

Yep. This file path was randomly selected for testing purposes to ensure the storage services can handle special characters correctly.

Are you provide the path with special characters to JFrog API and it forms such XML or that XML was constructed directly? I'm very doubt that such mature project doesn't know how to correctly escape & in XML.

In the example of the provided response you can notice that it is actually contains prefix + url-encoded path + not url-encoded (probably, original) path, where

Part Path
prefix /artifactory/example-repo-local
url-encoded path /bfc28925-81ec-4b69-b494-e2ee0f957ee8/aba0a24a-3a2b-48e2-ae65-ef2dff027458%20!%40%23%24%25%5E%26()_%2B-%3D%3B'%2C.txt
original (not url-encoded) path /bfc28925-81ec-4b69-b494-e2ee0f957ee8/aba0a24a-3a2b-48e2-ae65-ef2dff027458%20!@%23$%25%5E&()_+-=;',.txt

I doubt that response will contain path partially encoded, seems more like hand-crafted data.

Your workaround in any case is not correct, because it replaces exactly the string &()_+-=;, which as you say just the result of random generation. Probably the intention was to escape all special symbols, but you cannot blindly escape & because you do not need to do that in CDATA sections and in your example you have one.

@Xuanwo
Copy link
Author

Xuanwo commented Feb 28, 2024

Are you provide the path with special characters to JFrog API and it forms such XML or that XML was constructed directly? I'm very doubt that such mature project doesn't know how to correctly escape & in XML.

Yes. We write a file with path /bfc28925-81ec-4b69-b494-e2ee0f957ee8/aba0a24a-3a2b-48e2-ae65-ef2dff027458%20!%40%23%24%25%5E%26()_%2B-%3D%3B'%2C.txt and jfrog returns XML contains /bfc28925-81ec-4b69-b494-e2ee0f957ee8/aba0a24a-3a2b-48e2-ae65-ef2dff027458%20!@%23$%25%5E&()_+-=;',.txt.

The response is neither created nor altered by us.

The code could be found at here

let req = req
    .body(AsyncBody::Bytes(Bytes::from(PROPFIND_REQUEST)))
    .map_err(new_request_build_error)?;

// --------- send request to jfrog
let resp = self.client.send(req).await?;
if !resp.status().is_success() {
    return Err(parse_error(resp).await?);
}

// ---------- read the body and concat into a `Bytes`
let bs = resp.into_body().bytes().await?;

// ---------- deserialize it with quick-xml
let result: Multistatus = deserialize_multistatus(&bs)?;

Sorry for let you jumping into our code.

Your workaround in any case is not correct, because it replaces exactly the string &()_+-=;, which as you say just the result of random generation.

You're correct. Is it safe to ignore unrecognized XML entities? Like keep all &abcdd; as is?

@Mingun
Copy link
Collaborator

Mingun commented Feb 28, 2024

The response is neither created nor altered by us.

Then you should create a bug report in JFrog. If the returned path is guaranteed to be the one that you provided in request (but normalized) then you could sanitize the input.

Is it safe to ignore unrecognized XML entities?

This depends on your needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

No branches or pull requests

2 participants