Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting all inner element along with it's attributes #727

Open
surajchhetry opened this issue Mar 16, 2024 · 3 comments
Open

Getting all inner element along with it's attributes #727

surajchhetry opened this issue Mar 16, 2024 · 3 comments
Labels

Comments

@surajchhetry
Copy link

surajchhetry commented Mar 16, 2024

Hi there,
I need help to get all the nested element along with it's attribute. I want to get all the nested elements of <element1> along with it's attributes from given very large xml file .

<root-1>
<element1>
  <inner-1 id="1234">
       <inner-inner-1 a="abc" b="xyz"></<inner-inner>
  </inner-1>
  <inner-1 id="2345">
       <inner-inner-1 a="abc22" b="xyz44"></<inner-inner>
  </inner-1>

</element1>
</root-1>

<root-1>
<element1>
  <inner-1 id="23x3">
       <inner-inner-1 a="abc1" b="xyz1"></<inner-inner>
  </inner-1>
  <inner-1 id="234215">
       <inner-inner-1 a="abc223" b="xyz474"></<inner-inner>
  </inner-1>

</element1>
</root-1>
  
@Mingun
Copy link
Collaborator

Mingun commented Mar 16, 2024

It's quite hard to answer to your very broad question. What API you want to use? Serde or just raw XML events?

If I correctly understand your example, you have multi-root document (probable streamed from its source) and you want to deserialize it.

This is the start point that you can use to work with serde deserialization:

// <inner-inner-1 a="abc" b="xyz"></<inner-inner>
#[derive(Deserialize)]
struct InnerInner {
  #[serde(rename = "@a")]
  a: String,
  #[serde(rename = "@b")]
  b: String,
}

// <inner-1 id="1234">
#[derive(Deserialize)]
#[serde(rename_all = "snake_case")]
struct Inner {
  #[serde(rename = "@id")]
  id: String,

  // I'm not sure what the element name is,
  // the open tag is `<inner-inner-1>` and close is `</inner-inner>`
  inner_inner: InnerInner,
}

// <element1>
#[derive(Deserialize)]
struct Element {
  #[serde(default)]
  inner: Vec<Inner>,
}

// <root1>
#[derive(Deserialize)]
struct Root {
  element1: Element,
}

Those types should allow you to deserialize one <root-1> element. To deserialzie a stream of values, use the technique described in official serde site.

If you find that quick-xml missed some features preventing you to implement deserialization, feel free to open an issue or PR that adds them.

@surajchhetry
Copy link
Author

Hi @Mingun ,
thank you for your answer. Between Serde and XML event which one will be better for large file processing ?

@Mingun
Copy link
Collaborator

Mingun commented Mar 17, 2024

Serde is definitely a slower than XML events, about 3x times according to results of compare project with sample_rss.xml.

You also can try to use https://lib.rs/crates/xmlserde which uses quick-xml but provides serde-like derives specially for XML. I plan to add it to comparison project, but was not able to do that due to that fact it is depends on the quick-xml and we need find a way to substitute its dependency with local quick-xml copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants