Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allow expressions in backticks only identifier #4479

Open
Neppord opened this issue Jun 25, 2023 · 16 comments
Open

Don't allow expressions in backticks only identifier #4479

Neppord opened this issue Jun 25, 2023 · 16 comments

Comments

@Neppord
Copy link

Neppord commented Jun 25, 2023

Summary

The hypothesis is that the feature of having whole expressions in backticks are not used that often. And that it overcomplicates the parser.

Motivation

The feature was added due to the ease of doing so when using a different parsing library. By removing this feature we hope to make both the language simpler to learn for beginner and the parser easier to reproduce.

also:

Full infix expressions add a lot of complexity to the grammar. You have to carve out a whole subsection of the grammar that can only be used within the ticks, and also makes the layout algorithm more complicated because you don’t know when you have an open or a close delimiter. In my opinion they were only added because it was easy to kludge in with parsec backtracking. I think if we were considering it today we wouldn’t do it.

Proposal

Proposal is to first add a warning that the feature will go a way. and then remove it after 6 months.

Examples

this is currently legal:

x = 2 `flip (/)` 4

and in the future this expression would need to be written like so:

x = let f = flip (/) in  2 `f` 4
@rhendric
Copy link
Member

What's the actual problem here?

it overcomplicates the parser

As far as I can tell, the cost of this feature in the parser is one straightforward production. There's no special lexer behavior or monadic action. It's hard to imagine a user-visible feature that is less complex as far as the parser is concerned.

hope to make both the language simpler to learn for beginner

Any evidence that this trips beginners up?

and the parser easier to reproduce

language-cst-parser matches the canonical Haskell parser on this feature without complication. Are you encountering difficulty with some other PureScript parser?

@Neppord
Copy link
Author

Neppord commented Jun 26, 2023

When the first response I read is What's the actual problem here? I feel invalidated. I would rather be asked what is you need?, if what you believe is that my reasoning is miss guided.

Back to the issue at hand:

The compiler, and all duplication of the tooling, has a cost to maintain and build that is proportional to the features it implements and the accidental complexity of how those features are implemented. This is like any software, the compiler is not special in this regard. Do you need references for this?

As with optimizations: You can squeeze as much performance out of every cpu cycle as you can but nothing beats not doing work. The same is for maintaining code. You can use DRY, the 7 steps of naming, law of demeter and or any other strategy to improve maintainability, but noting beats deleting code.

So asking the question, Is this feature worth the cost it has? and my hypothesis is no, it doesn't. There are other ways to write the same thing and i don't see backticks used that much in the wild and never seen backtick expression used. I also know that there are teams has formatters that removes backticks all together.

When it comes to making it easier for people new to Purescript, its built on the assumption that there already is a good way to express the same thing. When reading code you want to focus as much as possible on the code in front of you, that is why naming is so important. You don't want to go and lookup what a function does because you don't understand what it is doing or don't trust that its doing what it says. This is even worse with language features. A feature that is not used frequently makes it even more difficult for unexperienced developer to know and learn. In best case it can guess from experience, and in worst case they needs to stop what they are doing and go and learn new syntax, which can be harder then reading source of a function that has a bad name.

@i-am-the-slime
Copy link
Contributor

i-am-the-slime commented Jun 26, 2023

without complication

image

I don't think that's quite correct 😁

Here's some more motivation by Nate:

Full infix expressions add a lot of complexity to the grammar. You have to carve out a whole subsection of the grammar that can only be used within the ticks, and also makes the layout algorithm more complicated because you don’t know when you have an open or a close delimiter. In my opinion they were only added because it was easy to kludge in with parsec backtracking. I think if we were considering it today we wouldn’t do it.

I also think that this subsection of the grammar makes it harder to get a PureScript tree-sitter or lezer grammar, both of which don't currently exist as far as I'm aware and ChatGPT refuses to make them for me.

@Neppord
Copy link
Author

Neppord commented Jun 26, 2023

On a more technical note @rhendric

As far as I can tell, the cost of this feature in the parser is one straightforward production. There's no special lexer behavior or monadic action. It's hard to imagine a user-visible feature that is less complex as far as the parser is concerned.

I belive that the change would mean that ithe production could instead become a lex token making it much simpler and quicker to parse, but its a bit beside the point.

@JordanMartinez
Copy link
Contributor

When the first response I read is What's the actual problem here? I feel invalidated. I would rather be asked what is you need?, if what you believe is that my reasoning is miss guided.

Because I was wondering the same thing that Ryan said above, I'll ask that question instead. What is your need? Or perhaps stated differently, when and why did this become an issue for you, @Neppord? I feel like there's probably more context here than what has been stated.

@Neppord
Copy link
Author

Neppord commented Jun 27, 2023

First, thanks for asking! :)

I'm developing the intellij plugin for purescript. And it has partial support for backticks. So there is no need for them to go away for that parser to work, it's a mix between the compiler and the one written in purescript. So my need is not to be able to implement the thing, that is already done. But whenever I read that part of the grammar and think about how often I see it's being used I get an urge to remove it. It's very deeply ingrained in me that even the small improvements over time have a big payoff. It's the 1% improvement a week that leads to (1.01**52) ≈ 68% improvement over a year that is beckoning me. So any easy and simple improvement i can make to any system is a itch i want to scratch.

It turns out though that this improvement wasn't so easy to make, there seams to be much more opinions against it then I imagineed and I'm about to give up, on this improvement.

I can try to dig deeper into my needs if this wasn't sufficient.

@Neppord
Copy link
Author

Neppord commented Jun 27, 2023

What needs do you two whant to fulfill by keeping the implementation as is? @rhendric and @JordanMartinez? What have I missed?

@JordanMartinez
Copy link
Contributor

JordanMartinez commented Jun 27, 2023

Thanks for providing more context!

To restate what I think you said.... You essentially want to build something superb by making many small changes over time (i.e. the IntelliJ plugin for PureScript). Every time you look at the extended infix expressions (as I call them in my reference work), there's an itch that says, "This is a flaw in the language that should be fixed by removing it." Previously, you would just ignore it, though it would bother you. Now you are raising an issue for it.

Generally, when making changes to the compiler/language, we need justification for such a change. However, some changes don't really have an objective justification as much as they are subjective opinions. This case initially felt more like a subjective opinion (i.e. "I think extended infix expressions are a bad idea, so let's remove them") rather than an objective one. Hence, why I was initially confused. Thus far, you have described removing these kinds of expressions from the language as an improvement, but it wasn't obvious to me how they are an improvement and whom benefits if such a change was done. I'm guessing whatever "pushback" you feel that you've gotten has been more from this angle (i.e. "Is this an improvement?") than people disagreeing that this should be done. Or put differently, I wanted to make sure this requested change wasn't a symptom of some larger issue (e.g. having trouble writing a parser than can handle this case).

Getting back on point. Should this be removed? Here are the topics of conversation that come to my mind:

  • The reason for its initial existence. If this feature was added because the original implementer added it because it was easy to do, not because it was agreed upon by others to add it after some discussion, then the argument for keeping it on the grounds that it's already in the language is weakened.
  • Breaking changes. The concern here is whether breaking people's code is "worth it" in this case. I'm not sure how many people are using this feature, but I imagine it's small. A poll might give a better estimate, but it still wouldn't be an accurate number. Grepping a recent package set would give a better idea. Regardless, if this number is small, then this is less of a concern.
  • Readability of expressions. Readability is a subjective thing. Is list1 `combineUsing concat` list2 less readable than combineUsing concat list1 list2?
    • I think it's more readable if one is reading the code like English and not like code
    • I think it's less readable if one is trying to understand how the arguments are applied to the combineUsing function.
  • Necessity / workarounds. As noted in the opening comment, there is a workaround to this as shown below, and the workaround isn't burdensome IMO. Arguably, one does not need to know of this feature to be productive (e.g. pattern matching), nor do most need to use it to write idiomatic code (e.g. type classes). The argument of "but this prevents me from writing/doing X" is greatly weakened:
    expr1 = list1 `combineUsing concat` list2
    expr2 = do
      let combineUsingConcat a b = combineUsing concat a b
      list1 `combineUsingConcat` list2
  • Learnability I don't consider this a "hard" feature to understand. In my mind, understanding something used often like # /flip (i.e. arg # function == function arg) is harder to learn because it can be used 1) multiple times in a chain, and 2) alongside of $ / apply: a $ b # c # d $ e.
  • Parser speed. If this rule is removed, presumably, the compiler can parse source code faster. But if no one is using this kind of expression, is that cost ever paid?
  • Grammar/layout simplification. Presumably, this makes the grammar simpler and the layout algorithm easier to implement. I guess the main argument here is that future additions/changes to the language might be easier if this is one case that doesn't have to be considered.
  • Tree sitter. Presumably, removing this would simplify such a grammar, which would improve IDE experiences. However, I'm not sure how true this claim is.

As for me, I think it's worthwhile to remove this feature because it's not a "necessary" language feature and the workaround isn't burdensome to write.

I think the argument for removing it is further strengthened if the below things are true:

  • few people are using it
  • it makes the grammar/layout simpler
  • it was added trivially by the original implementer
  • it makes future grammar/parser/layout changes easier
  • it makes implementing a tree sitter easier

If someone goes through the effort of showing exactly how many times this feature is used in the latest package set and the usage count and library count are low, I think arguments for keeping it would be weakened. However, this is just my opinion.

@rhendric
Copy link
Member

So first of all, I'm biased: I like and use this feature. I consider it one of the small quality-of-life improvements that PureScript offers over Haskell, and I wish GHC supported it. I would miss it if PureScript dropped it.

That doesn't mean I would object to its removal if it caused more problems than benefits, on net. But I am currently utterly unconvinced of any problems it causes (no, I don't need the abstract concept of the maintenance cost of code reiterated; not all lines of code have equal maintenance cost and I need convincing that these specific lines cause maintenance problems, preferably from someone who actually maintains them), and would like to see a stronger case for these problems more than I would want to see a comprehensive survey of how much this feature is used—if it doesn't cause any actual problems for actual users or maintainers, then one happy user is enough to keep it in my opinion, and I am such a one.

All that said, I would be more sympathetic to a proposal to make this feature easier for external tools to handle by reducing its interaction with layout. I see no need to support infix expressions spanning multiple lines, for example, if that's difficult for some of these external systems.

As a point of information, this feature was added by Phil as discussed in #839. (A link in that issue points to this usage by @natefaubion of seven years ago, and I'm really curious to hear from him about why the apparent about-face in whatever message people are quoting in this thread.)

@rhendric
Copy link
Member

Oh, and just to be clear, @Neppord, since we miscommunicated before: when I use the word ‘actual’, I'm not intending to imply that problems you experience are invalid. I mean ‘actual’ as opposed to ‘hypothetical’—I want examples of difficulties you've actually had in your actual experience, not difficulties you hypothesize that you could have or that you hypothesize that PureScript maintainers have. Any problems that you've had in your actual experience are valid and I would like to hear about them, even if we end up disagreeing on the outcome.

@Neppord
Copy link
Author

Neppord commented Jun 27, 2023

Here is the discussion on discord that made me in the end create this PR, https://discord.com/channels/864614189094928394/872650895744176188/1122110393985286206

@natefaubion
Copy link
Contributor

As a point of information, this feature was added by Phil as discussed in #839. (A link in that issue points to this usage by @natefaubion of seven years ago, and I'm really curious to hear from him about why the apparent about-face in whatever message people are quoting in this thread.

That usage likely represents the only time I've ever used it, and it was only a sort of "see how it feels" usage. You can probably find an example of any kind of syntax at some point or another in code I've written. At that time I also noted "I could take it or leave it". Since then, I contributed the parser and layout algorithm. I think the layout formulation we have (as baroque as it appears to be) is crucial to the proliferation of (accurate) alternative parsers. It's written in a quirky style that's easy to translate verbatim to other languages, but is otherwise largely impossible to reason about given any mistakes. I feel pretty strongly that, given the current state of the parser and layout algorithm we would likely not add this feature today if it was proposed (especially given hindsight about it's usage), and I nominally support removing features that create additional complexities for layout. Another example (probably a more significant contributor to complexity) being unquoted layout keywords in record accessor position. In particular for backticks, since it isn't a symmetric delimiter, you don't know if any given backtick is an open or close, so you have to:

  • Add specific lexer state just for tracking backtick balancing (you can nest backticks 🙀).
  • Inspect the whole layout stack to determine which position a tick is in. In particular, we speculatively collapse the layout context whenever you see one (ie, assume any given tick is a closing tick) and see if it makes sense, otherwise backtrack as an open tick (potentially inefficient? though an unlikely significant contributor to performance woes as it's rare).

Ideally, I had wanted nested backticks to be fully prohibited, but this requires forking the expr grammar below the appropriate precedence level. So I opted to just supporting it in layout with the backtracking check. Another alternative could be to only support a single application spine with non-recursive terms (I think this would allow all the examples I've seen), which would allow us to only need to inspect the top item in the layout context like all the other rules.

Whether or not it should actually be removed (aside from my strong opinion on the syntax overall) is a different question. I have no intention of doing any of the work necessary to actually make it happen, so that means I'm still in the "take it or leave it" camp unless someone is actually motivated to see it through and just needs an approval.

@Neppord
Copy link
Author

Neppord commented Jun 27, 2023

I definitely can do the work, that is "the small effort" in my opinion. And it will also benefit me in learning the codebase better.

@natefaubion
Copy link
Contributor

I definitely can do the work, that is "the small effort" in my opinion. And it will also benefit me in learning the codebase better.

The work is more than the compiler though. I'd consider it to also be researching it's usage in the ecosystem, and potentially contributing fixes to make the transition easier if we determine it's worth it.

@Neppord
Copy link
Author

Neppord commented Jun 27, 2023

I have a project that uses the CST parser that index the package-set to enable some features in the plugin.

I ran it on psc-0.15.7 since it was the last one i had indexed and looked for infix expressions that contained more then an identifier, this was the output:

{
  "these": 4,
  "psa-utils": 3,
  "ps-cst": 1,
  "profunctor": 1,
  "optparse": 3,
  "halogen-hooks": 1,
  "graphql-client": 4,
  "datetime": 1
}

@Neppord
Copy link
Author

Neppord commented Jun 27, 2023

@natefaubion I agree, and as you can see I have already started to gather data. And since i plan for adding a warning first, people will be able to ask for help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants