Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JavaScript runtime: could the transition classes be exported? #4539

Open
seb314 opened this issue Feb 23, 2024 · 11 comments
Open

JavaScript runtime: could the transition classes be exported? #4539

seb314 opened this issue Feb 23, 2024 · 11 comments

Comments

@seb314
Copy link

seb314 commented Feb 23, 2024

The classes from the transition/* subfolder of the javascript runime are not exported in in index.node.js and index.web.js.

Is this intentional, or could they be exported similar to e.g. the classes in atn/*?

Background: we rely on some of the classes in our autocomplete implementation. With antlr4 v4.9.3, we were able to import them via their path, but now that "exports" are configured in antlr's package.json, the imports by path no longer work.

Happy to open a PR if this makes sense.

@seb314 seb314 changed the title JavaScript runtime: transition classes not exported JavaScript runtime: could the transition classes be exported? Feb 23, 2024
@ericvergnaud
Copy link
Contributor

Mmm... interesting....
Why not simply use ATN.getExpectedTokens ?
What more do you get from transitions for the cost of replicating a core ANLTR function ?

@seb314
Copy link
Author

seb314 commented Feb 27, 2024

My understanding of why we have a custom implementation is:

  1. for certain grammar rules, our desired autocomplete suggestions are not individual tokens, but a sequence of tokens that corresponds to the input consumed by the rule. To achive this, our depth-first search on the ATN has a callback hook for RuleTransitions. Example: a rule that describes a '/'-delimited file system path, where the suggestions are full paths of existing files, rather than individual directory names.
  2. semantic predicates are taken into account for the suggestions (getExpectedTokens doesn't seem to do this?)
  3. to find all (state, ctx) pairs that can be reached after consuming the input up to the cursor position, we do a depth first search on the ATN, where the transitions are the edges of the graph. I think this would be necessary in a getExpectedTokens-based approach too (in order to find the stateNumber and ctx parameter values), but maybe there is a higher-level method available that I'm not aware of.

@ericvergnaud
Copy link
Contributor

Thanks for the explanation. Is this source code accessible ?
Reason I want to dig into this is because in antlr5, transitions will definitely not be accessible, so we need to understand the usage scenario such that it can still be implemented (using other means)

@seb314
Copy link
Author

seb314 commented Feb 28, 2024

Its not public, but I'm currently checking if I can share it.

@seb314
Copy link
Author

seb314 commented Mar 1, 2024

@ericvergnaud I can probably send you the relevant code via email.
(I'd prepare a self-contained set of files that can be built and run with only public dependencies)
Would this be of interest to you?

@ericvergnaud
Copy link
Contributor

It would definitely.

@seb314
Copy link
Author

seb314 commented Mar 2, 2024

@ericvergnaud I've sent you an email to your address from git

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Mar 15, 2024

@seb314 I've started implementing a built-in ANTLR solution such that developers don't nee to know about ANTLR internals for computing suggestions.
A not-so-obvious finding is that the set of expected tokens at a given caret position does not vary with the enclosing context. So at this point, the API I'm experimenting with is simpler than discussed, as follows (in Java):

Pair<RuleContext, IntervalSet> getExpectedTokensAt(RuleContext startRuleContext, int line, int column) { .../... }

The returned context is the most specific context that contains the token (or last token) at the caret position. From there, it's straightforward to walk the hierarchy of contexts upwards (using the parent field).

The above assumes all semantic predicates default to true.
I would need to write some specific tests to support execution of semantic predicates, but that only makes sense if you stick to your current multi-dialect grammar ? Have you looked into grammar imports ?

See PR #4557

@ericvergnaud
Copy link
Contributor

@seb314 Re 1) i.e. "our desired autocomplete suggestions are not individual tokens, but a sequence of tokens that corresponds to the input consumed by the rule", I have some questions re your example:

  • since a path parser rule can be defined by something like: ('/' name)+ (which could very well be a series of divisions), am I correct in assuming that suggesting real paths is bound to the rule rather than a token sequence ?
  • if your lexer had a token for paths, would support for token sequence suggestions still be necessary ?
    More generally, the scenario where only one specific token can follow is theoretically a good candidate token sequence suggestions. However, when there is more than one, the number of sequences explodes more than exponentially. Not sure how manageable that is... And I'm a bit skeptical with the UX when suggesting more than 1 word (except for literals such as full paths). Can you provide more insights ?

@seb314
Copy link
Author

seb314 commented Mar 19, 2024

@seb314 Re 1) i.e. "our desired autocomplete suggestions are not individual tokens, but a sequence of tokens that corresponds to the input consumed by the rule", I have some questions re your example:

* since a path parser rule can be defined by something like: `('/' name)+` (which could very well be a series of divisions), am I correct in assuming that suggesting real paths is bound to the rule rather than a token sequence ?

By "sequence of tokens" I meant: if we are in a path rule, then we want to suggest a complete path rather than an individual name element. So the actual use case is much simpler than arbitrary sequences.

* if your lexer had a token for paths, would support for token _sequence_ suggestions still be necessary ?
  More generally, the scenario where only one specific token can follow is theoretically a good candidate token _sequence_ suggestions. However, when there is more than one, the number of sequences explodes more than exponentially. Not sure how manageable that is... And I'm a bit skeptical with the UX when suggesting more than 1 word (except for literals such as full paths). Can you provide more insights ?

Resolved by the previous point. (Note: we are not using tokens for full paths for some quite specific reasons, but think those reasons are not relevant for the general case.)

Re your previous message: I'll first have to dig in a bit and respond in more detail afterwards.
My first guess re semantic predicates is that assuming them as true will probably be good enough for us (either via grammar imports or with some token filtering in post-processing)

@seb314
Copy link
Author

seb314 commented Mar 19, 2024

@ericvergnaud I commented a testcase in #4557 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants