Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panick on read_namespaced_event with different buffers #125

Closed
willempx opened this issue Apr 29, 2018 · 13 comments
Closed

panick on read_namespaced_event with different buffers #125

willempx opened this issue Apr 29, 2018 · 13 comments
Labels
bug namespaces Issues related to namespaces support

Comments

@willempx
Copy link

Environment: Debian 9 amd64
Reproducibility: Always
Version: quick_xml 0.12.1. Also known to be reproducible with 0.11.0.
Steps to reproduce: compile and run this:

extern crate quick_xml; // version "0.11.0" or "0.12.1".

fn main()
{
    let xml = r#"<?xml version='1.0'?><a:a xmlns:a='http://example.org/something' xmlns='b:c'><a:d><hello xmlns='x:y:z'><world>earth</world></hello></a:d>"#;
    let mut parser = quick_xml::Reader::from_str(xml); // for quick_xml 0.11.0, use "quick_xml::reader::Reader".
    let mut buf = Vec::new();
    let mut buf_ns = Vec::new();
    for _ in 0..4 {
      let _ = parser.read_namespaced_event(&mut buf, &mut buf_ns);
    }
    let mut buf = Vec::new();
    let mut buf_ns = Vec::new();
    for _ in 0..4 {
        let _ = parser.read_namespaced_event(&mut buf, &mut buf_ns); // <-- panick
    }
}

Acutal result: it panics:

$ RUST_BACKTRACE=1 cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `/home/willem/e/crash-quick-xml/target/debug/crash-quick-xml`
thread 'main' panicked at 'index 29 out of range for slice of length 0', libcore/slice/mod.rs:785:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at libstd/sys_common/backtrace.rs:59
             at libstd/panicking.rs:380
   3: std::panicking::default_hook
             at libstd/panicking.rs:396
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:576
   5: std::panicking::begin_panic
             at libstd/panicking.rs:537
   6: std::panicking::begin_panic_fmt
             at libstd/panicking.rs:521
   7: rust_begin_unwind
             at libstd/panicking.rs:497
   8: core::panicking::panic_fmt
             at libcore/panicking.rs:71
   9: core::slice::slice_index_len_fail
             at libcore/slice/mod.rs:785
  10: <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index
             at /checkout/src/libcore/slice/mod.rs:916
  11: core::slice::<impl core::ops::index::Index<I> for [T]>::index
             at /checkout/src/libcore/slice/mod.rs:767
  12: quick_xml::reader::Namespace::prefix
             at /home/willem/.cargo/registry/src/github.com-1ecc6299db9ec823/quick-xml-0.12.1/src/reader.rs:855
  13: quick_xml::reader::NamespaceBufferIndex::find_namespace_value::{{closure}}
             at /home/willem/.cargo/registry/src/github.com-1ecc6299db9ec823/quick-xml-0.12.1/src/reader.rs:902
  14: core::iter::traits::DoubleEndedIterator::rfind::{{closure}}
             at /checkout/src/libcore/iter/traits.rs:580
  15: <core::slice::Iter<'a, T> as core::iter::traits::DoubleEndedIterator>::try_rfold
             at /checkout/src/libcore/slice/mod.rs:1319
  16: core::iter::traits::DoubleEndedIterator::rfind
             at /checkout/src/libcore/iter/traits.rs:579
  17: <core::iter::Rev<I> as core::iter::iterator::Iterator>::find
             at /checkout/src/libcore/iter/mod.rs:459
  18: quick_xml::reader::NamespaceBufferIndex::find_namespace_value
             at /home/willem/.cargo/registry/src/github.com-1ecc6299db9ec823/quick-xml-0.12.1/src/reader.rs:899
  19: <quick_xml::reader::Reader<B>>::read_namespaced_event
             at /home/willem/.cargo/registry/src/github.com-1ecc6299db9ec823/quick-xml-0.12.1/src/reader.rs:566
  20: crash_quick_xml::main
             at src/main.rs:15
  21: std::rt::lang_start::{{closure}}
             at /checkout/src/libstd/rt.rs:74
  22: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:479
  23: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:102
  24: std::rt::lang_start_internal
             at libstd/panicking.rs:458
             at libstd/panic.rs:358
             at libstd/rt.rs:58
  25: std::rt::lang_start
             at /checkout/src/libstd/rt.rs:74
  26: main
  27: __libc_start_main
  28: _start

Expected result:

  • if the API is not supposed to be used like this, and if it is possible to enforce that with Rust's typesystem: the API is written in such a way that Rust's typesystem forbids to use the library like this
  • if the API is not supposed to be used like this, and if it is not possible to enforce it with Rust's typesystem: the English API specifications state it must not be used like this
  • if the API does not intend to forbid to use it like this: it should not panic.
@willempx
Copy link
Author

I notice now that this is probably the same as #107.

@tafia
Copy link
Owner

tafia commented May 17, 2018

Sorry I haven't answered. Yes it is the same issue (not using the same buffer).
I let this open because I really must figure out a way to prevent people from doing this.

@phillord
Copy link

I am not completely convinced you want to stop people doing this. I have come back to after a while because it really does stop me building a recursive parser straight-forwardly and a parser with a single loop becomes unreadable rapidly. So, it seems a intuitive thing to want to do.

Is there not an easier way of supporting this usage, either with methods that do not require access to the internal buffers, or creation of Events that do not share ownership?

@tafia
Copy link
Owner

tafia commented Jul 17, 2018

I have some time now. I'll work on it.

@phillord
Copy link

phillord commented Jul 17, 2018

To give an example of the sort of thing I am trying to achieve,
consider this code. We use parse which then dispatches to
parse_fred for a particular part of the XML tree. Now I can reduce
the size of my matches, as well as reuse code when the XML structure
reappears at different parts of the tree, potentially with different
meanings.

But this breaks. The mutable borrow of self.buf prevents all
subsequent calls to self. I can package buf and ns_buf up in
what ever way I choose, but I am stuck regardless; I cannot recurse
while a e still exists. Nor can I get new buffers because that
panics.

As always, my rust is very limited, but I have been around the houses
with this one and can see no way out.

I realise that this is extending the bug report beyond it's current
scope, for which apologies; I can put a new issue in if it would be
clearer.

    extern crate quick_xml;

    use quick_xml::events::Event;
    use quick_xml::Reader;
    use std::io::BufRead;

    struct Read<R>
    where
        R: BufRead,
    {
        reader: Reader<R>,
    }

    impl<R: BufRead> Read<R> {
        fn new(reader: Reader<R>) -> Read<R> {
            Read {
                reader: reader,
                buf: Vec::new(),
                ns_buf: Vec::new()
            }
        }

        fn parse(&mut self) {

            loop {
                let mut e = self.reader.read_namespaced_event(&mut self.buf,
                                                              &mut self.ns_buf);

                match e {
                    Ok ((ref ns, Event::Start(ref mut e)))
                        if *ns == Some(b"http://www.example.com") => {
                            match e.local_name() {
                                b"Fred" => {
                                    self.parse_fred();
                                }
                                _ => {}
                            }
                        }
                    _ => {}
                }
            }
        }

        fn parse_fred(&mut self) {
            loop {
                let mut e = self.reader.read_namespaced_event(&mut self.buf,
                                                              &mut self.ns_buf);

                match e {
                    Ok ((ref ns, Event::Start(ref mut e)))
                        if *ns == Some(b"http://www.example.com") => {
                        match e.local_name() {
                            b"George" => {
                                println!("George Inside Fred");
                            }
                            _ => {}
                        }

                    }
                    _ => {}
                }
            }
        }
    }

@phillord
Copy link

After considerable trial and error, I think I have managed to get a
solution for this which works with both read_event and
read_namespaced_event.

It's fairly hairy but at least it works. This could be cleaned up with
a read_owned_event and read_owned_namespaced_event methods.

It involves a lot of copying also, which seems inefficient, but I
cannot see an alternative. This may negate the speed advantage of
quick_xml over xml-rs. It seems a unfortunate outcome for something
which is, ultimately, a code re-organisation.

@willempx I am curious. Your original example is pathological and meant as a reproduction rather than useful. What were you trying to do when you got the bug in the first place?

extern crate quick_xml;

use quick_xml::events::Event;
use quick_xml::Reader;
use std::io::BufRead;

pub struct Read<R>
where
    R: BufRead,
{
    reader: Reader<R>,
    buf: Vec<u8>,
    ns_buf: Vec<u8>
}

impl<R: BufRead> Read<R> {
    pub fn new(reader: Reader<R>) -> Read<R> {
        Read {
            reader: reader,
            buf: Vec::new(),
            ns_buf: Vec::new()
        }
    }

    pub fn parse(&mut self) {

        loop {
            let mut e;
            {
                let e_inner = self.reader.read_event(&mut self.buf);
                match e_inner {
                    Ok(e_inner_inner) => {
                        e = e_inner_inner.into_owned();
                    }
                    Err(_) => {
                        panic!("Panic");
                    }
                }
            }

            match e {
                Event::Start(ref mut e) => {
                    match e.local_name() {
                        b"Fred" => {
                            self.parse_fred();
                        }
                        _ => {}
                    }

                }
                _ => {}
            }
        }
    }

    fn parse_fred(&mut self) {
        loop {
            let mut e = self.reader.read_event(&mut self.buf);

            match e {
                Ok (Event::Start(ref mut e)) => {
                    match e.local_name() {
                        b"George" => {
                            println!("George Inside Fred");
                        }
                        _ => {}
                    }

                }
                _ => {}
            }
        }
    }


    pub fn parse_ns(&mut self) {

        loop {
            let e;
            let o;

            {
                let e_inner = self.reader.read_namespaced_event(&mut self.buf,
                                                                &mut self.ns_buf
                );
                match e_inner {
                    Ok((option, e_inner_inner)) => {
                        e = e_inner_inner.into_owned();
                        o = option.unwrap().to_owned();
                    }
                    Err(_) => {
                        panic!("Panic");
                    }
                }
            }

            match (o, e) {
                (_, Event::Start(ref mut e)) => {
                    match e.local_name() {
                        b"Fred" => {
                            self.parse_ns_fred();
                        }
                        _ => {}
                    }

                }
                _ => {}
            }
        }
    }

    fn parse_ns_fred(&mut self) {
        loop {
            let mut e = self.reader.read_event(&mut self.buf);

            match e {
                Ok (Event::Start(ref mut e)) => {
                    match e.local_name() {
                        b"George" => {
                            println!("George Inside Fred");
                        }
                        _ => {}
                    }

                }
                _ => {}
            }
        }
    }
}

@tafia
Copy link
Owner

tafia commented Jul 19, 2018

Ok so let me try answering.

The core part of this issue will probably be fixed naturally once NLL lands on rust compiler, so I don't think fixing this is particularly urgent (needs to be confirmed tough). Your issue is very similar with issues you may have when working with HashMaps.

See conditional-control-flow

Here is a workaround (showing only the parse fn, nothing changes otherwise) which should still be efficient, allow you to use nested calls BUT if slightly more verbose.

 fn parse(&mut self) {
        enum State {
            ParseFred,
            Ignore,
        }
        loop {
            let state = { 
                let e = self.reader.read_namespaced_event(&mut self.buf,
                                                              &mut self.ns_buf);
                match e { 
                    Ok ((ref ns, Event::Start(ref e))) if *ns == Some(b"http://www.example.com") => {
                        match e.local_name() {
                            b"Fred" => State::ParseFred,
                            _ => State::Ignore,
                        }
                    }
                    _ => State::Ignore,
                }
            };
            match state {
                State::ParseFred => self.parse_fred(),
                State::Ignore => {}, 
            }
        }
    }

@willempx
Copy link
Author

@willempx I am curious. Your original example is pathological and meant as a reproduction rather than useful. What were you trying to do when you got the bug in the first place?

I think I was in a similar scenario as you in #107 : it was a function that parsed an XML and called helper functions to parse parts (e.g. the parse_book_info function calls parse_author_info and also calls parse_publisher_info). The helper functions created their own new buffer (rather than reusing a buffer), causing the crash.

But I was just playing around as a rust beginner (which I still am); do not take what I was doing as advice or a good example. The bug report was just about the apparent mismatch between API specification and implementation (no precondition written in the specification, but still a crash in the implementation).

@phillord
Copy link

@tafia True enough. I think I will use my solution (of copying), which I should be able to back out cleanly and replace with NLL hit stable.

I've got a state enum at the moment -- it something I was hoping to get rid of with some functions! Thanks for all the explanations, especially as my issues are a little tangential to the bug.

@phillord
Copy link

phillord commented Oct 1, 2018

Just to update on this.

I have tried this with NLL on nightly and indeed the example I gave in #125 (comment) does indeed compile fine without complaints (I tried this in July, but rustc crashed out on my code).

Unfortunately, since I wrote originally, I've changed my code organisation and it simply raises other ownership issues. I'll see whether I can resolve these, but for the moment, I have partial success, but no working code!

@vandenoever
Copy link

I'm hit by this bug too. I'm parsing a very complicated XML file and one loop does not suffice.

This is minimal code that panics:

fn main() {
    let content = "<a:a xmlns:a='urn:a'></a:a>";
    let mut r = quick_xml::Reader::from_reader(content.as_bytes());
    let mut buf = Vec::new();
    let mut ns_buf = Vec::new();
    r.read_namespaced_event(&mut buf, &mut ns_buf).unwrap();
    r.read_namespaced_event(&mut buf, &mut ns_buf).unwrap();
    r.read_namespaced_event(&mut buf, &mut ns_buf).unwrap();
    // use different namespace buffer for closing tag
    let mut ns_buf = Vec::new();
    r.read_namespaced_event(&mut buf, &mut ns_buf).unwrap();
}

Here is my use-case. I'm stuck when parsing xml with quick-xml. The challenge is handling ownership with the function read_namespaced_event. This function takes &mut Vec[u8] and places a reference to that in a returned Event. The Event has a &[u8] to the passed buffer. So before read_namespaced_event can be called again, Event has to be dropped.

An element may have two series of child elements, e.g. an optional element <c1/> followed by an number of <c2/> elements.

First there is a loop to read <c1/>. Once <c1/> has been found, a loop is started to read <c2/>. If the first read Event contains a <c2/>, the event must be kept (because it contains the attribute information for the first <c2/>) and moved to the second loop, the one for reading the <c2/> elements.

@tafia
Copy link
Owner

tafia commented Aug 14, 2019

Feel free to share examples of code not working. Thanks for the heads up.

@Mingun Mingun added bug namespaces Issues related to namespaces support labels May 21, 2022
@Mingun
Copy link
Collaborator

Mingun commented Jun 19, 2022

Duplicate of #59

@Mingun Mingun marked this as a duplicate of #59 Jun 19, 2022
@Mingun Mingun closed this as not planned Won't fix, can't repro, duplicate, stale Jun 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug namespaces Issues related to namespaces support
Projects
None yet
Development

No branches or pull requests

5 participants