Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser position tracking #552

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Themayu
Copy link

@Themayu Themayu commented Feb 11, 2023

Parser position tracking

Store an Option<quick_xml::reader::Span> inside each of the quick_xml::events::Bytes* structs. This span is computed by the parser during reading, and passed into the relevant wrap() function during value construction. It can be accessed by calling relevant span() methods, which will clone and return the span (Range, sadly, does not implement Copy due to its status as an Iterator implementation.)

Additionally, the derive implementations of PartialEq on the structs is replaced with a manual implementation of PartialEq. This is to prevent previously true equality comparisons such as bytes == BytesStart::new("...", 4) suddenly changing their behaviour to return false. This does mean that we lose our implementations of StructuralPartialEq, meaning that constants of the relevant types can no longer be used as patterns in match arms. However, since quick_xml exposes no way to construct values of these types at compile time, and since StructuralPartialEq is currently nightly-only and therefore cannot be used in trait bounds in stable Rust, I do not believe this change is sufficiently observable to require a major version bump under semver. Paragraph rendered obsolete by design change.

Other than the above, I do not intend to make any other possibly breaking changes to the public API with this contribution.

Changes

  • Each of the Bytes* structs under quick_xml::events receives a new field, span: crate::reader::Span.
    • If created via parsing, this field will store a value of n..m, representing the start and end positions of that event's contents - for tags, this is the positions of the < and > character; while for text, it is the start and end of the text content, with respect to optional whitespace trimming.
    • If created via the API, this field will store 0..0.
  • Each of the structs receives a new trait, Spanned, which defines methods to retrieve and update the stored span.
  • Altering data for a given event, such as its element name or adding attributes, does not alter the span it is given. This is because such edits do not alter the source it was initially read from.

Progress

  • Events
    • quick_xml::events::BytesStart
    • quick_xml::events::BytesDecl
      • As far as I can tell, this one can just reuse the span inside BytesStart
    • quick_xml::events::BytesEnd
    • quick_xml::events::BytesText
    • quick_xml::events::BytesCData
    • Method on quick_xml::events::Event to return the underlying span if there is one
  • Attributes
    • quick_xml::events::attributes::Attributes
    • quick_xml::events::attributes::Attribute
    • quick_xml::events::attributes::Attr
  • Events
    • quick_xml::errors::Error
    • quick_xml::escape::EscapeError
    • quick_xml::events::attributes::AttrError
  • Namespaces (unsure whether this one makes sense)
    • quick_xml::name::QName
    • quick_xml::name::LocalName
    • quick_xml::name::Prefix
    • quick_xml::name::Namespace
  • Figure out where to add tests for the new functionality
  • Edit Changelog.md with new functionality

PR changelog

  • 2023/02/16 18:31 GMT: Rebased PR source on an incomplete implementation that will be continued.
  • 2023/02/17 07:16 GMT: Restructured Progress section, added mini-changelog to PR.

@Mingun Mingun linked an issue Feb 11, 2023 that may be closed by this pull request
@Themayu
Copy link
Author

Themayu commented Feb 16, 2023

Rebased PR source on an incomplete implementation by @Mingun which I will continue work on. Updated PR details to reflect change in design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A way to get the row/column of a Reader
2 participants