Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic literals are not properly detected. #10

Open
olson-sean-k opened this issue Jan 23, 2022 · 2 comments
Open

Semantic literals are not properly detected. #10

olson-sean-k opened this issue Jan 23, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@olson-sean-k
Copy link
Owner

There is at least one case of false positive and numerous false negatives in Glob::has_semantic_literals and report::diagnostics. Semantic literals are detected via the token::literals function, but that function's behavior is a bit unusual and is insufficient in some cases. At the time of writing, it adapts a token sequence into a sequence of component and literal sequence pairs. To do this, it first uses the token::components function and, if a component is not itself a literal sequence, it recurses into any and all group tokens (alternatives and repetitions). Functions that detect semantic literals simply iterate over the sequence of pairs to find any literals of note, in particular . and ...

This causes false positives in glob expressions like **/a{b,..}c, because token::literals does not consider tokens adjacent to groups within a component and emits .. in its output. Note that in that example, it is not possible for .. to occur alone in the final component, because a and c are always present. Similarly, glob expressions like **/{a,.}{b,.} result in false negatives, because the possible match .. in the final component is not considered at all.

The token::literals function could be reworked, but I think its behavior is already a bit murky and that, instead, a bespoke detection function is needed. Perhaps the machinery in the rule module could be refactored and reused, as it already considers tokens that are adjacent to groups within a component and something similar is needed here.

@olson-sean-k olson-sean-k added the bug Something isn't working label Jan 23, 2022
@olson-sean-k
Copy link
Owner Author

Taking another glance at this, I think the solution is to simply examine the invariant string of each component. Using Component::to_invariant_string would prevent the false positive described above and token::literals could simply be removed. It's not clear if the false negative described above (**/{a,.}{b,.}) should be considered at all. While that expression could match a semantic literal, it does not strictly encode it. I think it's okay to ignore such occurrences.

@olson-sean-k
Copy link
Owner Author

olson-sean-k commented Apr 20, 2022

Looking at this yet again, I think token::literals isn't too far off and only falls short because there isn't a great model for components and component boundaries. token::components only considers top-level boundary tokens in the token tree and group tokens (alternatives and repetitions) are never considered boundary tokens.

I think the right general solution for this problem is a more sophisticated model for boundaries and boundary tokens. For example, the alternative in the expression foo{/bar,/baz} is effectively a boundary token, but in foo{bar,baz} it never is and in foo{bar,/baz} it is for some branches but not for others. Similarly, the repetition in the expression foo</bar:1,> is effectively a boundary token, but in foo<bar:1,> it never is and in foo</bar:0,> it only sometimes is.

Such "conditional boundaries" could be queried (or even considered and produced by token::components) to isolate literal sequences that exist as their own components in more complex expressions. These more sophisticated components could be used to raise warnings for more appropriate and interesting cases like foo{bar,/baz/..} while not doing so for cases like foo{bar,..}, where .. does not form its own component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant