Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh(shell) recognize prompt that contain tilde(s) #2859

Merged

Conversation

ggrossetie
Copy link
Contributor

@ggrossetie ggrossetie commented Nov 13, 2020

resolves #2858

Changes

Add space (\s) and tilde (~) in the list of characters that a prompt can contain.

Checklist

  • Added markup tests
  • Updated the changelog at CHANGES.md
  • Added myself to AUTHORS.txt, under Contributors

Comment on lines 15 to 24
begin: '^\\s{0,3}[/~\\s\\w\\d\\[\\]()@-]*[>%$#]',
starts: {
end: '$', subLanguage: 'bash'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about false positives now though as this is getting very close to matching "almost anything" that has one of 4 characters in the middle... Feels like it's time to break this out into variants that handle come common prompt types...

Or else (at the very least) break this variant itself out and give it 0 relevance so it does not interfere with auto-detect of of other things.

Suggested change
begin: '^\\s{0,3}[/~\\s\\w\\d\\[\\]()@-]*[>%$#]',
starts: {
end: '$', subLanguage: 'bash'
}
variants: [
{
begin: '^\\s{0,3}[/~\\w\\d\\[\\]()@-]*[>%$#]',
},
{
// add space but now this matches almost anything, so less relevant
begin: '^\\s{0,3}[/~\\s\\w\\d\\[\\]()@-]*[>%$#]',
relevance: 0
}
],
starts: {
end: '$', subLanguage: 'bash'
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think there are a few common combos that would go really far it would be nice to break them out though I think...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good idea!
I wasn't aware of the variants and relevance. I will take a closer look but your suggestion looks fine 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly even without your change the first is probably too greedy (esp considered this is in common), but we can save that problem for another day as long as we aren't making it worse. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually adding a variant here to look ahead to find a typical command ([a-z]{3,} or something) and giving that relevance but not giving anything else relevance might be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look ahead to find a typical command ([a-z]{3,} or something)

[a-z]{3,} is probably too limiting, for instance nc is only two characters and I'm pretty sure some users are using one letter alias or command. You can also use path-prefixed command /bin/bash or ~/cmd or ./cmd or \\path\\to\\SomEtHiNg.exe.
My point is that I'm not entirely sure that it would be easier to find a typical command than finding a typical prompt... What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky...

echo /path/to/home> t.exe

We know that "echo" is a command but the regular expression does not, so echo /path/to/home> could be a prompt and t.exe could be the command...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are two things here....

[a-z]{3,} is probably too limiting

I'm talking about relevancy. You're adding an additional variant (to reduce false positives). Most commands are more than 2 chars. It doesn't have to match ALL cases, just MOST... we'll still highlight both, but only one counts towards auto detect.

> v                       # 0 relevance, who knows what this is
> git commit -m'blah'     # 1 relevance

echo /path/to/home> t.exe

Indeed this is what I was warning about - it's too broad now, probably now you realize this is why we didn't have this in the first place. :-) Spaces are tricky to include - it can now match too much. I'm thinking right now this probably should stay simple (to match SIMPLE prompts) unless you want to start adding a few common variants - and we could see how far that gets us (or find out it's a terrible idea).

[root@localhost ~]# yum list --show-duplicates neo4j-enterprise
[x@y .*][>%$#]\s ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is what I was warning about - it's too broad now, probably now you realize this is why we didn't have this in the first place. :-) Spaces are tricky to include - it can now match too much.

I will remove \s from the regular expression and just keep ~.
I will also add a comment to explain why space is not included 😅

unless you want to start adding a few common variants - and we could see how far that gets us (or find out it's a terrible idea).

I think we should keep it simple. I would argue that using a single letter prompt (i.e. $) is probably the way to go when you want to make sure that your input will be properly highlighted. If you are using a "complex" prompt then there's no guarantee.

Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

@joshgoebel
Copy link
Member

I think long-term the answer here might be the expand this and have a list of submodes and then try to match common things you'd have in a prompt (rather than matching the whole prompt):

begin: /.../ //, very broad idea of what a prompt is (look-ahead)
contains: [
  PATH,
  HOSTNAME,

But probably pointless without a LARGE lists of real prompts to work with to see if we were getting some kind of reasonable hit rate. Here is mine for example:

~/w/highlight.js(master+41|✚1…24↩18) %

CHANGES.md Outdated Show resolved Hide resolved
@ggrossetie ggrossetie changed the title recognize prompt that contain tilde(s) and/or space(s) recognize prompt that contain tilde(s) Nov 15, 2020
@joshgoebel joshgoebel changed the title recognize prompt that contain tilde(s) enh(shell) recognize prompt that contain tilde(s) Nov 15, 2020
@joshgoebel joshgoebel merged commit 4fed71d into highlightjs:master Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(shell) Prompt that contain spaces or tildes are not recognized
2 participants