Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More feature-full format_description macro #428

Closed
vincentdephily opened this issue Jan 18, 2022 · 21 comments
Closed

More feature-full format_description macro #428

vincentdephily opened this issue Jan 18, 2022 · 21 comments
Labels
A-format-description Area: format description A-macros Area: macros C-enhancement Category: an enhancement with existing code

Comments

@vincentdephily
Copy link

vincentdephily commented Jan 18, 2022

I'm parsing a rfc3339-ish format with some flexibility (and UTC/Local offset selected out of band). I've ended up with the following incantation:

use time::{
    format_description::{modifier::*, Component, FormatItem::*},
    parsing::Parsed,
};
let mut p = Parsed::new()
    .with_hour_24(0)
    .unwrap()
    .with_minute(0)
    .unwrap()
    .with_second(0)
    .unwrap()
    .with_offset_hour(input_offset) // Work in progress ;)
    .unwrap();
let rest = p.parse_items(
    input_str.as_bytes(),
    &[
        Component(Component::Year(Year::default())),
        Literal(b"-"),
        Component(Component::Month(Month::default())),
        Literal(b"-"),
        Component(Component::Day(Day::default())),
        Optional(&Compound(&[
            First(&[Literal(b"T"), Literal(b" ")]),
            Component(Component::Hour(Hour::default())),
            Literal(b":"),
            Component(Component::Minute(Minute::default())),
            Optional(&Compound(&[
                Literal(b":"),
                Component(Component::Second(Second::default())),
            ])),
        ])),
    ],
)?;

That does the job, but I'd prefer using time::macros::format_description for readability (and maybe compile-time goodness, I'm not sure). But AFAICT it's missing support for optional, first, and compound.

Would it be possible to extend the grammar ?

@jhpratt
Copy link
Member

jhpratt commented Jan 18, 2022

Well…Compound is the whole thing and is already what's output from the macro. Optional and First are the ones without syntactical support. I'm open to any ideas you may have on how to extend the grammar. I've thought about it previously and couldn't come up with a decent solution.

@jhpratt jhpratt added A-format-description Area: format description A-macros Area: macros C-enhancement Category: an enhancement with existing code labels Jan 18, 2022
@vincentdephily
Copy link
Author

AFAIU Compound is only supported at the root ?

What's needed is a form of nesting. Something like [compound [...]], [optional [...]] and [first [...]], with each ... being recursively parsed like the current top-level expression. So my initial example would translate to

[year]-[month]-[day][optional [[first [T ]][hour]:[minute][optional [:[second]]]]]

Note that when recursing, whitespace inside brackets is significant, otherwise [first [T ]] wouldn't work as expected.

This can get a bit verbose, so [compound [...]]/[optional [...]]/[first [...]] could be aliased to [*[...]]/[?[...]]/[![...]] or*[...]/?[...]/![...] or even <...>/(...)/{...} if making the grammar more sigil-heavy is acceptable and if we're confident we won't have more wrapper types :

[year]-[month]-[day][?[[![T ]][hour]:[minute][?[:[second]]]]]

or

[year]-[month]-[day]?[![T ][hour]:[minute]?[:[second]]]

or

[year]-[month]-[day]({T }[hour]:[minute](:[second]))

Actually, looking at that last example I realized that I initially tough optional took a list of formatitems instead of just one.

  • This looks like an assumption that simplifies the grammar nicely, so (...) should translate to Optional(&Compound(&[...])) on the rust side.
  • It'd be nice to have Optional(&[...]) supported on the rust side directly but it's not worth a compat break. Maybe a new Optionals enum variant ?
  • We still need grammar-level support for compound, because first might want to choose between multiple compounds.

@jhpratt
Copy link
Member

jhpratt commented Jan 19, 2022

AFAIU Compound is only supported at the root?

Compound can be arbitrarily nested. I checked and Compound isn't what's actually output by the macro — it's just a slice reference to FormatItems.

Note that when recursing, whitespace inside brackets is significant, otherwise [first [T ]] wouldn't work as expected.

You're assuming literals will always be one character. This certainly isn't the case. The only way around this, realistically, is to have it be quoted. This isn't ideal because now we're forcing people to either escape the quotes or use raw strings. We'd also have to handle escaping within the literal ourselves.

It'd be nice to have Optional(&[...]) supported on the rust side directly but it's not worth a compat break. Maybe a new Optionals enum variant?

Optional(Compound(…)) works currently. The parser could automatically output this as needed.

@vincentdephily
Copy link
Author

I checked and Compound isn't what's actually output by the macro — it's just a slice reference to FormatItems.

Yes that's what I meant, Compound can be nested but only in code, not in grammar.

You're assuming literals will always be one character. This certainly isn't the case. The only way around this, realistically, is to have it be quoted. This isn't ideal because now we're forcing people to either escape the quotes or use raw strings. We'd also have to handle escaping within the literal ourselves.

I did think about multi-character :) The "significant whitespace" rule just avoids some corner-cases and makes recursion easier.

First is the only problematic wrapper regarding multi-char, my idea was that they would be supported via compound: [first [[compound [h: [hour]]][compound [m: [minute]]]]. Other ways to handle this could be to accept multiple arguments: [first [h: [hour]] [m: [minute]] ] or to use adjacency: [first [h: [hour]]][first [m: [minute]]] (which also works with sigil syntax {h: [hour]}{m: [minute]}). I think the multi-bracket [first [...] [...] [...]] version is the cleanest and most intuitive.

The only in-litteral escaping needed is for [ (and (/{/< if we decide to use them), which is already the case of the existing grammar.

I'm not sure how you feel about adding sigils to the grammar ? They enable shorter descriptions and look nicer / more readable, but they make the grammar more complex, aren't as future-proof and don't seem to work well for multi-bracket first.

@jhpratt
Copy link
Member

jhpratt commented Jan 20, 2022

Where would a user need to explicitly state that something is Compound? First takes a slice, and Optional is semantically only a single item regardless — a Compound would be generated from the inner contents as needed without a user needing to specify it.

I think that if you want to have either a T or a space, [first [T] [ ]] is reasonable syntax, rather than combining the two possibilities in a single item (as [first [T ]]).

Anything outside of brackets is a no-go for compatibility reasons. That includes using parenthesis, braces, and angle brackets, as well as prefix or postfix sigils outside of brackets. I also don't think sigils are more readable — the current format description arose because the previous version used too many without clear meaning or purpose.

@vincentdephily
Copy link
Author

My initial suggestions needed an explicit [compound [ ]] but you're right : the latest one doesn't, cool. So to recap:

  • Add a [optional [abc...]] grammar item that maps to Optional(&Compound(&[a,b,c...]))
  • Add a [first [a] [bc] ...] grammar item that maps to First(&[Compound(&[a]),Compound(&[b,c])...])
  • Parser could optimize single-item Compound(&[a]) into just a
  • When recursing, whitespace is significant so that [ ] is a literal space

@jhpratt
Copy link
Member

jhpratt commented Jan 22, 2022

That is my understanding as well. Note that the output from the parser isn't guaranteed, only the behavior, so the optimization noted is not mandatory.

For the recursive part, I still need to think about escaping. The left bracket is simple — double it as is the case for top-level items. The right bracket does not have this capability right now, as the grammar was designed without thinking of nesting. I will need to investigate whether it is possible to permit escaping the right bracket by doubling it in a backwards-compatible manner. My immediate concern is that consecutive right brackets will be quite common, and forcing a space between them isn't ideal.

@jhpratt
Copy link
Member

jhpratt commented Nov 1, 2022

Somehow I never considered this before now, but in order to output these values from the runtime parser, owned format descriptions (#429) are necessary, as I can't emit a slice to a newly-created object. With that said, though, the implementation isn't too difficult now that I've rewritten things.

@jhpratt
Copy link
Member

jhpratt commented Nov 4, 2022

Mixed news on this front. I was able to implement option items fully with the aforementioned syntax, such that this test passes:

assert_eq!(
    format_description::parse_owned("[optional [:[year]]]"),
    Ok(OwnedFormatItem::Compound(Box::new([
        OwnedFormatItem::Optional(Box::new(OwnedFormatItem::Compound(Box::new([
            OwnedFormatItem::Literal(Box::new(*b":")),
            OwnedFormatItem::Component(Component::Year(Default::default()))
        ]))))
    ])))
);

Unfortunately, removing the colon results in something unexpected (at first glance). Because [[ is treated as an escape sequence, it gets lexed differently. This affects the "depth" of the lex, which is what determines if something is part of a component or a literal.

I might be able to work around this by carving out an edge case for nested format descriptions specifically ("optional" and "first"), but I will need to be careful in doing so. I still need to think about how to permit a literal ] without closing the sequence as well.

Edit: I've successfully worked around that issue. The only outstanding issue for parsing optional items is the escaping of ].

@jhpratt
Copy link
Member

jhpratt commented Nov 11, 2022

This has been implemented for the runtime parser on main. The escape character for a right bracket is a backslash. The only remaining step (implementation-wise) is to port it to the macro parser, which should be relatively simple. Then I need to actually document this…

It doesn't matter publicly, but the optimization of Compound(&[a]) to a was actually very straightforward to implement, so I've done so.

@vincentdephily Please give this a test run! If you run into any bugs, let me know.

@vincentdephily
Copy link
Author

It works, thanks. I only tested my specific format and [/] escaping, nothing fancy. Some observations:

  • Having two different escape chars for [ and ] feels inconsistent. I'd rather use a backslash everywhere but I understand this might be a breaking change.
  • Why are first/optional restricted to parse_owned() ? I would have thought that parse()'s Vec<FormatItem<'_>> could handle nested FormatItems just as well ?
  • Needs the compile-time part of the work, as you mentioned. format_description!() is described as compile-time parse() rather than parse_owned(), I hope it doesn't run into the same problem.

@jhpratt
Copy link
Member

jhpratt commented Nov 16, 2022

  • I probably could use ]] as an escape sequence, but it would be frustrating to deal with due to the otherwise unnecessary whitespace. This is quite evident, even in your test, as you would need to add spaces add the end. This might have semantic meaning too; I didn't think past the unnecessary whitespace issue.
    With regard to consistency, the only option here within time 0.3 is using the backslash to escape the opening bracket when in a nested format description. But then there's an inconsistency between the top-level description and nested format descriptions.
  • FormatItem stores a reference to a slice/item. Any references that exist would necessarily have to have existed before the call to parse. This is fine for the static case (including macro-generated), as the references are promoted to local consts by the compiler.
  • Absolutely! I haven't updated documentation fully for parse_owned. The new items will be supported by the macro, emitting the slices you're currently using.

@vincentdephily
Copy link
Author

vincentdephily commented Nov 16, 2022

It's not pretty, but we can probably live with those escapes for now. I find [[ and ]] hard to read and easy to mistype, especially with arbitrary nesting. A backward-compatible way to get \ everywhere would be to create new functions, but it's probably not worth it. Are you interested in opening "0.4 compat-break wishlist" issues ?

I'll give the updated docs and macro a look when they're ready.

@jhpratt
Copy link
Member

jhpratt commented Nov 17, 2022

The reason [[ was originally chose is because Rust uses {{ to perform similar escaping in format strings. At that time, I did not consider that nested descriptions could ever be a thing (Optional and First didn't even exist then).

You did give me a thought, though. As there is parse and parse_owned, a new parse_borrowed could be introduced that uses a backslash escape. parse could then maintain the current behavior, possible being deprecated. It's not the prettiest solution, but it would work.

It would be ideal if Rust supported default type parameters for functions, as I could then have marker structs to indicate the desired behavior (current, borrowed, or owned). This is not the case, however.

Right now I think the wishlist for 0.4 is very small, so an issue isn't necessary.

@jhpratt
Copy link
Member

jhpratt commented Nov 21, 2022

Alright, round two! I have pushed a change that introduces backslash for escapes everywhere. You should be able to do \\, \[, and \], resulting in a literal \, [, or ] respectively. This is the case regardless of if you're in a nested description.

For backwards compatibility, I have made this change by introducing versioning internally. parse uses version 1, while the new parse_borrowed and parse_owned use version 2. For the macro, I'm not sure whether I want to pass the version as an argument or introduce another macro. What are your thoughts?

@jhpratt
Copy link
Member

jhpratt commented Dec 7, 2022

I have just pushed a commit that permits "optional" and "first" items in the format_description! macro. To use the backslash escapes, you will need to call it like format_description!(v2, "foo"). When omitted, the default is v1 for backwards compatibility. Note that optional and first items can be created when using v1, but they will use the older [[ escape. For most users, this likely will not matter.

The commit also fixes a bug where [[ was an escape sequence even in parse_borrowed and parse_owned.

The only remaining implementation issue is being able to set the version when using time::serde::format_description!. As the parsing capability already exists, I'm hoping to have this done relatively quickly. Then it's just documentation and a release! If anyone wants to write documentation, a release will come quicker as a result 🙂

@jhpratt
Copy link
Member

jhpratt commented Dec 12, 2022

Alright. I've changed it from v1 to version = 1 and permitted it for the serde::format_description macro. While not strictly necessary, it avoids the need for significant (though not unbounded) lookahead.

Please give it a test! The only remaining issue is documentation.

@jhpratt
Copy link
Member

jhpratt commented Feb 16, 2023

This has been released as part of time 0.3.18`. Documentation is available along with the rest of the docs for format descriptions here.

@jhpratt jhpratt closed this as completed Feb 16, 2023
@flisky
Copy link

flisky commented Feb 16, 2023

It seems time-macros need a version bump to bring version = 2 in. Right? @jhpratt

@jhpratt
Copy link
Member

jhpratt commented Feb 17, 2023

Whoops! I neglected to publish time-macros. As versions are pinned, I'll need to push another version of time as well. It will be done tonight.

@jhpratt
Copy link
Member

jhpratt commented Feb 17, 2023

CI will do its thing, but I have just released it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-format-description Area: format description A-macros Area: macros C-enhancement Category: an enhancement with existing code
Projects
None yet
Development

No branches or pull requests

3 participants