Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow styling of output lines by non-path patterns and styles.less #36

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

hg42
Copy link

@hg42 hg42 commented Feb 26, 2017

coming from scite and loving process-palette, I thought, I'll extend the functionality a bit.
This PR implements styling the output lines via pattern expressions and the user's styles.less.

snippet from process-palette.json:

...
  "patterns": {
    "dashes": {
      "expression": "^--"
    },
    "command": {
      "expression": "^>"
    },
    "exclamation": {
      "expression": "^ *!"
    },
...

snippet from styles.less:

atom-panel .process-palette-output-panel .dashes
{
  color: #009999 !important;
}

atom-panel .process-palette-output-panel .command
{
  color: #0000ff !important;
}

atom-panel .process-palette-output-panel .exclamation
{
  color: #009900 !important;
}

Patterns not using "(path)" will be used as line expression and if matching the line will be wrapped in a <span class="name-of-pattern">...</span>. The first matching pattern stops the matching, but path-expressions are handled separate from line expressions, so paths can be styled inside matching lines.

I am unsure if the class names should be extended by a prefix like
"line-"
or may be
"process-palette-line-".

When using many expressions, I would also like to have them default to enabled, but that's another topic, I guess.

(I'm new to atom, so forgive me if I am doing something wrong...)

@hg42
Copy link
Author

hg42 commented Feb 26, 2017

I used traditional kind of building the HTML for the styling, because I'm still not familiar with atom's way (and could not get it to work).
Feel free to change it to something more advanced.

@morassman
Copy link
Owner

Hi @hg42

This is a very innovative feature! I'll be happy to merge it in. There's just two thing I'll have to run by you first.

First, on line 189 you have

if not have_line
  line_class = "match"

, but the style match isn't defined anywhere. Do you need a style at all in this case or did you intend to still define it? If not, then it can just be an empty string.

Second, the following unfortunately introduces a bug:

line += ($$ ->
  @span =>
     @subview "#{@lineIndex}", pathView
   ).html();

The culprit is the .html() at the end.

Atom uses space-pen for constructing its UI. The original code was

pathView = new PathView(cwd, match);
  @outputPanel.append(match.pre);
  @outputPanel.append $$ ->
    @span =>
      @subview "#{@lineIndex}", pathView
  @outputPanel.append(match.post);

When you look at path-view.coffee you'll see that it extends View. You'll also see that PathView adds a click handler, which will open the file represented by the path that was matched. In the original code the $$ creates an instance of a View, which in turn contains an instance of PathView. An instance of a View object is therefore added to outputPanel (not HTML), which then retains the click handler that was implemented.

What the .html() does is it converts it to plain text. It therefore loses the click behaviour. The path is still shown with an underline, but clicking on it has no effect.

@hg42
Copy link
Author

hg42 commented Feb 27, 2017

    if not have_line
      line_class = "match"

this assigns a class to all lines matching a path pattern. I thought it might be useful to someone to be able to style such lines.
But I think the path views are shown strong enough and could be styled more emphasized if necessary.
I'll change it to a comment in case someone will need it.

@hg42
Copy link
Author

hg42 commented Feb 27, 2017

I managed to find an implementation using space-pen, but I have one remaining problem.

I used this to output a matching line wrapped with a <span>:

@outputPanel.append(
  $$ ->
    @span {class: line_class}, line
  );

This works, but if I have something like:

  line = 'a line with "quotes" or other <special chars> inside'

those special characters are converted to &quot; &lt; &gt; etc. which would work in a html page, but the output pane shows them like this:

! a line with &quot;quotes&quot; or other &lt;special chars&gt; inside

In Developer Tools they are shown (optically) as expected:

<span class="exclamation" is="space-pen-span">! a line with &quot;quotes&quot; or other &lt;special chars&gt; inside</span>

but copying the element gives:

<span class="exclamation" is="space-pen-span">! a line with &amp;quot;quotes&amp;quot; or other &amp;lt;special chars&amp;gt; inside</span>

so it seems all the & characters are replaced by &amp;

I will start digging into the documentation of space-pen to find a solution.
You probably already know the solution or you can give me a hint?

@hg42
Copy link
Author

hg42 commented Feb 27, 2017

ok, I guess @outputPanel.append(...) also does HTML-escapes?

I found a solution using @raw function:

@outputPanel.append(
  $$ ->
    @span {class: line_class}, => @raw(line)
  );

@morassman
Copy link
Owner

I actually don't have a solution nor a hint. Sorry...

I personally didn't find space-pen to be very intuitive nor its documentation very insightful. Even Google-ing for examples doesn't yield much.

I'm sure you'll get it right, but I'll also look into it a bit myself.

As for the special characters. I don't know if you saw it, but these are actually handled in the method called sanitizeOutput with the line output = escapeHTML(output); This method comes from underscore.string. There's also an unescapeHTML method that does the opposite. Just be careful with this. It was added to solve another issue. In fact, this was added by means of another pull request. You can take a look at it to see the issue that it addressed.

@hg42
Copy link
Author

hg42 commented Feb 27, 2017

Thanks for the quick answer, however I already found a solution...see above.

Thanks for the hint to the other PR, ansi sequences are useful for shell commands. I didn't notice that coloring is possible, because commands like ls detect if they are running in a tty and disable colors otherwise. ls --color works (but the TERM variable has to be correct, at least on unix).

@hg42
Copy link
Author

hg42 commented Feb 27, 2017

I added a fsp.isFileSync(match.path) to detect if the path is a file, so time strings like 13:45 and directories and other kinds of file names that cannot be opened don't match.
This is especially useful, when multiple files exist on the output line.

@hg42
Copy link
Author

hg42 commented Feb 27, 2017

re-thinking the whole matching:

  • a line can contain more than one path.
  • line matching is a special case (just a pattern that matches the whole line)

in fact each line pattern could be transformed:

    /error/ -> /^.*error.*$/

In general, patterns could match whatever they want and add a <span class="PATTERN-NAME"> to the matching text.
The matches could even be nested.

But:

  • patterns could overlap and it's not easy to detect or handle this case
  • path patterns should not be nested in other path patterns

I'm not sure if I should complete this PR now or better try a bit more to implement these ideas.
May be its kind of "shooting on sparrows with cannons" as we say in Germany (~ use a sledgehammer to crack nuts).

What do you think?

@hg42 hg42 changed the title allow style output lines by non-path patterns and styles.less allow styling of output lines by non-path patterns and styles.less Feb 27, 2017
@hg42
Copy link
Author

hg42 commented Mar 1, 2017

After thinking some time about generalization, I feel it would lead to much more complications and more complex configuration than being useful:

  • generalized regexps should allow multi-line matching to be really useful (many error messages are multiline) but then incrementally processing single lines would not be sufficient
  • generalized regexps would need flags and javascript does not allow to specify flags in groups like perl
  • nesting and overlapping results of regexps lead to very complex conditions, users would not be aware of

So, I would conclude, let's better keep it simple.

The current implementation should be sufficient for line based matching and works well for my needs. At least it's already better than the SciTE way with it's non-configurable expressions.
Currently, I am happy how it works now.
I use it to output colored lines while debugging or to emphasize lines containing errors or wanrings and similar output. I also put some less important lines a bit more to the background by coloring them more grayish, e.g. long stack traces or less informational lines of a build process.

What about documentation...where and how much should I add some?
What about defaults?

Btw., I also would like to change the default to enabled for all patterns, because it's uncomfortable to enable all these patterns for new commands, at least for line patterns.
Line-patterns currently used by me do not conflict, because they usually start with a special character at the beginning of the line. And they are of general interest. E.g. lines with a lot of dashes are always special enough to be colored.
So I have all expressions enabled for all commands.

The default Path RegEx could be optimized a bit, then the default could be sufficient for all commands. The Path RegEx seems to be more dependent on the operating system, than on the used command. I would like to have this as a global setting instead of or additionally to the per item setting.

However the Path Expression (using (path) and (line)) is often dependent on the command.
But I think, we could create a better default expression with multiple alternatives to fit most cases:

something like:

\b(at|file)\s+(path).*\bline\s+(line) |
(path):(line):(column) |
(path):(line) |
(path)

written multi-line with space in front of "|", to make it more clear. Perhaps it should be a multi-line setting for user convenience and lines joined by "|" afterwards.
This default setting could be improved over time, freeing users from defining them individually.

I also think, there is a mistake in the current default path regex itself and also in the function (without having tested). I will probably add an issue and may be a PR after some testing.
But first this PR should be finished.

@hg42
Copy link
Author

hg42 commented Mar 7, 2017

Just a note, what's going on...

I had some problems with detection of path patterns and rendering.

python generates messages like:
File "foo.py", line 123, some message

The quotes were "sanitized" by outputToPanel, so the regex did not match the &quot;.
I moved the sanitize calls to several places in appendLine, which seems to work.

Additionally, the local file name without directory wasn't matched.
Because I now check paths against existing files, I rewrote the default path regex to be simpler.

Apart from that, I had more than one path pattern in a single line.
So, for now I added a quick solution for that, invoking appendLine with match.post and a continuation parameter to suppress (whole-)line matching for this part (only path matching).
As I noticed, this is not a complete solution, because the continuation will only apply the (whole-)line style to the first part of the line. I am thinking about using the continuation parameter to solve that.

I also noticed something similar before:
when a line is output in more than one parts, it will be matched and styled for line pattern in the first part only.
In this special case this was very convenient, because the main part of the line was colored and the final part was an unstyled "ok" or "FAILED" which was output after some time.
I'm not sure in which cases a line will be output in more than one part. This seems to be controlled by timing.
Perhaps the styling should be delayed to the newline?
But then the part of the line already added to the output pane would have to be replaced later.

@hg42
Copy link
Author

hg42 commented Mar 7, 2017

to test multi-part output I wrote this shell script:

t=-----; echo -n "--test";           echo "($t)"
t=0.000; echo -n "--test"; sleep $t; echo "($t)"
t=0.001; echo -n "--test"; sleep $t; echo "($t)"
t=0.005; echo -n "--test"; sleep $t; echo "($t)"
t=0.006; echo -n "--test"; sleep $t; echo "($t)"
t=0.007; echo -n "--test"; sleep $t; echo "($t)"
t=0.008; echo -n "--test"; sleep $t; echo "($t)"
t=0.009; echo -n "--test"; sleep $t; echo "($t)"
t=0.010; echo -n "--test"; sleep $t; echo "($t)"
t=0.050; echo -n "--test"; sleep $t; echo "($t)"
t=0.100; echo -n "--test"; sleep $t; echo "($t)"

with my styling pattern of ^--, I get varying results.
Most of the time lower sleeps are output in one line, the break is below 0.010 in most cases.
I also had a single one case where all lines were output in two parts. When Atom is very busy it may also vary from line to line.

@morassman
Copy link
Owner

Hi @hg42

Sorry for being so quiet. I haven't had much free time lately. It looks like you've been quite busy! By now you seem to have a more comprehensive grasp of the styling than me in general.

I agree that we should try and keep things simple. When I originally started with the pattern matching I kept it simple by just trying to match one pattern per line. I suppose that is perhaps too simple, but it helped me to get something out that works for, I guess, most use cases.

When it comes to enabling all patterns by default, I think I'll add a setting so that one can choose if that should be the case and rather have it disabled by default. When I release updates I try to keep the default behaviour as-is so that existing users aren't affected, or surprised, too much.

The path expression can become a global setting. I think, though, that it should only serve as a default so that one can always override it on a per-command basis. I also agree that the path expression can be improved on like you suggested.

I'll try to look at the changes you've made in more detail soon, but I can at least say that I won't release another update until this PR has been merged.

@hg42
Copy link
Author

hg42 commented Mar 7, 2017

Hi @morassman,

I already assumed you were very busy....so I had more time to go forward.

Because of multiple path patterns in a single line and the way I am already matching the rest of the line recursively, I think it's more complicated than what I described before.
As a result, I am already on the way to implement some of my ideas, which makes the whole thing more simple (I hope).

I currently think it's better to separate whole-line expressions from part-of-line expressions.
I think I'll go this way:

  • path expressions are roughly the same as part-of-line expressions
  • path expressions will be converted to PathViews
  • other part-of-line regexs will be converted to span objects
  • converting will be done sequentially: match(line); while(remaining) match(remaining)
  • resulting objects will be collected together with strings between them
  • while converting, the matching parts will be replaced by place-holders (may be <PATH> and <pattern-name> or something) in the originally line
  • whole-line expressions can be detected by using ^ at the start
  • whole-line expressions will be matched against the line with the place-holders
  • the first whole-line expression will style via a wrap of the collected array

btw. can I redefine RegExPattern and PathPattern, so that RegExPattern is used for regex expressions and PathPattern is used for path expressions?

@morassman
Copy link
Owner

I agree with you that separating whole-line and part-of-line expressions would be better.

Most of your points make sense to me. I'm not 100% sure what the collected array is for as apposed to just writing directly to the output, but I'm sure it'll make sense when I see the implementation.

Don't worry about preserving RegExPattern and PathPattern. You can change them as you see fit.

@hg42
Copy link
Author

hg42 commented Mar 7, 2017

Thanks for all that.

The array is needed, because the line pattern is tested after converting subexpressions to objects and then wraps all this in a span.
It would be easier, if I could combine this as plain html-text, but I need to preserve the PathView objects.

@hg42
Copy link
Author

hg42 commented Mar 8, 2017

there are now four passes:

  • path expressions: collecting an array of PathViews and strings around them
  • inline expressions: matched against each of those collected strings (but not paths!)
  • rebuild the line with matched parts replaced by playe-holders
  • line expressions (starting with '^'): matched on the rebuilt line

Because of that strategy inline expressions can not match over paths.
And in general they cannot be nested in any way.
All inline expressions including paths can only match sequential.
But line expressions can match over all, but ignore paths and inline matches.
(eventually the inline matches shouldn't be ignored? seems to depend on usage)

I added a global setting defaultPatterns, with (pseudo-code):
pattern_names = command.patterns or global.defaultPatterns or ['default']

I had several problems with space-pen.
One minor issue remains: there is an additional span that could be left out:

<span is="space-pen-span">       <======== this
  <span class="process-palette-path-view" click="clicked">
    /usr/local/src/atom/packages/process-palette/lib/main.coffee
  </span>
</span>

But I wasn't able to drop it. This does not affect functionality, but it is sub-optimal.

I think this could be accepted now (may be after some tests by others).
I think I should add further improvements later in additional PRs.

@hg42
Copy link
Author

hg42 commented Mar 8, 2017

as an example, I currently use this in my process-palette.json:

  "patterns": {
    "colon": {
      "expression": "(path):(line)"
    },
    "file_line": {
      "expression": "(?:\\bfile +|\\bat +)?['\"]?(path)['\"]?.*\\bline +(line)"
    },
    "path": {
      "expression": "['\"]?(path)['\"]?"
    },
    "error": {
      "expression": "[-\\w]*error[-\\w]*"
    },
    "warning": {
      "expression": "warning"
    },
    "dashes": {
      "expression": "^\\s*--"
    },
    "command": {
      "expression": "^\\s*>"
    },
    "exclamation": {
      "expression": "^\\s*!"
    },
    "bracket": {
      "expression": "^\\s*[[{]"
    },
    "build": {
      "expression": "^\\s*build"
    },
    "at": {
      "expression": "^\\s*at\\b"
    }
  },
  "defaultPatterns": [
    "colon",
    "file_line",
    "path",
    "error",
    "warning",
    "exclamation",
    "dashes",
    "command",
    "bracket",
    "build",
    "at",
    "default"
  ]

and these in styles.less:

atom-panel .process-palette-output-panel {
  background: #ffffff !important;
  color: black !important;
}

atom-panel .process-palette-output-panel .process-palette-path-view {
  //background: #ffff99 !important;
  color: #0000cc !important;
}

atom-panel .process-palette-output-panel .dashes {
  color: #009999 !important;
}

atom-panel .process-palette-output-panel .bracket {
  color: #cc00cc !important;
}

atom-panel .process-palette-output-panel .command {
  color: #0000ff !important;
}

atom-panel .process-palette-output-panel .exclamation {
  color: #cc00cc !important;
}

atom-panel .process-palette-output-panel .build {
  color: #666666 !important;
}

atom-panel .process-palette-output-panel .error {
  color: #ff0000 !important;
  background-color: #ffffcc !important;
  //padding-left: 2px;
  //padding-right: 2px;
}

atom-panel .process-palette-output-panel .warning {
  color: #999900 !important;
}

atom-panel .process-palette-output-panel .at {
  color: #9999cc !important;
}

@morassman
Copy link
Owner

Hi @hg42

I started looking at your implementation and testing it. It's looking very good.

I think the concept of defaultPatterns isn't very obvious though. At the moment there isn't a place in the UI where default patterns are configured, so one needs to know about it and edit it directly in the process-palette.json file. (Unless I'm missing something?)

Tell me if you think this is a good idea. I'd like to add a column to the 'Edit Patterns' part of the config editor called 'Default'. This column will have a checkbox so that one can choose which of the patterns should be a default pattern. This will be a simple way of showing the concept of a default pattern to the user and making it editable from the UI at the same time.

@hg42
Copy link
Author

hg42 commented Mar 15, 2017

hi @morassman,

I'm on vacations now, so this is from memory only.

You wanted to keep it compatible, so I added defaultPatterns. I thought, a GUI wouldn't be necessary.

defaultPatterns defines the processing sequence of patterns for all commands where not configured individual (meaning no pattern enabled for the command).

With defaultPatterns, you don't need individual sequences for each command, if you always want the same.

If you want a GUI for that, it would be just like the individual setting (the one with the little arrows) but only a single global setting.

A more intuitive way would be to be able to sort patterns in "Edit Patterns" and may be a single check box for each command saying something like "choose sequence individually" or "use default sequence".

@morassman
Copy link
Owner

Styles
I'll remove the CSS from Atom's styles.less. Instead of manually copying styles I'll just wait for you to commit yours.

how do you use the patterns?
The ability to match paths and to click on them to open the file was a request by someone else, but I've never had a need for it myself.

look
I don't mind that you change them. I just find them a bit bulky. If you'd like you can see what it looks like and keep it if looks fine.

pattern selection in command editor
The reason that there is the ability to choose patterns per command is because not all commands produce the same patterns. In fact, one might not even want patterns to be applied at all to certain commands. This will especially be the case with your new feature. It might be annoying if patterns that applies to one command are applied to one that doesn't suit it.

Please don't remove the list. Whether there is a CSV text field or a list comes down to the same thing, but I would like to keep the list. I think a middle ground would be to keep the list, but only show the patterns that were actually chosen. The button you mentioned can then be added so that one can use it to choose the patterns that one wants for that command.

If you don't like the arrows then you're welcome to remove them and make them draggable like you did for the table. For the list, though, you won't need checkboxes to select them. Since the pattern names aren't editable in that context one can just click and drag from the pattern name.

When it comes to a command one should be able to either choose:

  • to use the default patterns
  • use certain selected patterns
  • not use any patterns at all

Migration
I really don't think that a notification will add any value. From the user's point of view it shouldn't matter that the format has changed in the background. The migration should be automatic in the background without them even knowing about it.

I've done migrations in my other package. What I did was to add a version field to the config. The first time a migration is done there wouldn't be a version field, so one knows that a migration is necessary. Thereafter, only if the user's config file has an older version than the code, then a migration is necessary.

The migration is straight forward, but I think leave it to me. I'd like to implement it in the same way that I did for my other package.

@hg42
Copy link
Author

hg42 commented Apr 1, 2017

I commited to my master branch, you can test it now

Styles
ok, that should work now, you shouldn't need styles for process-palette in your personal styles.less.
But beware, if you have changed some theme variables in styles.less, this might not be reflected in process-palette, because theme variables used in package styles are taken before styles.less is applied (or something like that, not sure how it is done).

look
I already styled the check boxes and I find they fit better in size and look than the simple browser look.
However they are flat, so not fitting really well to the buttons.
Btw. I also changed the command list in left column to have buttons instead of simple text to show that they are clickable (which I didn't know at first). I also wonder why only some command texts of the command palette at the bottom panel are clickable. What should they do? what's the rule which are clickable?

pattern selection in command editor

This will especially be the case with your new feature. It might be annoying if patterns that applies to one command are applied to one that doesn't suit it.

sure...I can imagine such cases. For this I would ideally prefer pattern groups. May be a command could use several groups. But ok, let's keep it as it is.

Please don't remove the list. Whether there is a CSV text field or a list comes down to the same thing, but I would like to keep the list. I think a middle ground would be to keep the list, but only show the patterns that were actually chosen. The button you mentioned can then be added so that one can use it to choose the patterns that one wants for that command.

my only concern is the size of the list. Perhaps it could be shown horizontally (floating checkbox items). Eventually I'll try at some time later on another PR.

If you don't like the arrows then you're welcome to remove them and make them draggable like you did for the table.

so you still think they are necessary?
I think the sequence should be maintained globally because this depends mainly on the construction of the patterns, which is done in the patterns editor. You cannot move patterns around at will. E.g. my inline_trace_result has to be before inline_error etc. because it has to look at the whole line.
My former attempt to solve this was to always use the whole line (eventually with placeholders for the already matched patterns). But this was difficult to handle and created conflicts (the problem is overlapping matches). I had some ideas how to solve this generally, but decided to go the simpler route. May be when I need this desperately I will give it another thought...

  • to use the default patterns
  • use certain selected patterns
  • not use any patterns at all

ok, so simple solution is:

  • checkbox "use default patterns"
  • if enabled the list could be hidden (or disabled)
  • no pattern would be "no default patterns" and no pattern selected (not very nice when having many patterns), so best would be a select box with three states or three radio buttons ("none", "all", "selected").

Migration

I really don't think that a notification will add any value...without them even knowing about it.

ok, less work :-)

I've done migrations in my other package. What I did was to add a version field to the config

ok

The migration is straight forward, but I think leave it to me. I'd like to implement it in the same way that I did for my other package.

ok, but this has influence to the other code, because defaultPatterns can be extracted from the patterns list (all or only checked if we add a checkbox for that, but IMHO this should simply be all patterns).

@hg42
Copy link
Author

hg42 commented Apr 1, 2017

I tend to fix things when they get in the way, at least in smaller projects.
Is it ok for you to have changes in the PR, that don't really belong to it? Or should I try to split these into different PRs? some of them would be very small.
Or should we merge this outside of a PR? I could also create a new clean PR with a broader topic when we finish discussing this (removing some local changes like build.zsh from my own generalized build scripting system).

@hg42
Copy link
Author

hg42 commented Apr 1, 2017

ok, I found it myself.
In development tools I see the clicking behaviour in the process palette is linked to it's (possible) output.
I guess nobody knows that :-) and it's no(t| more) necessary because you have buttons for each output.
Also it links to the first output...

@morassman
Copy link
Owner

Hi. My comments:

my only concern is the size of the list

I'll change it later myself so that there is an Add button so that one only choose the patterns they want for that command. That way the list will only be as long as those that are chosen.

so you still think they are necessary?

Once drag-and-drop behaviour has been added to the list the arrow buttons won't be necessary. Until then they will be needed.

I think the sequence should be maintained globally because this depends mainly on the construction of the patterns, which is done in the patterns editor.

This might be the case for how you plan on using the patterns, but I don't think we should assume it would be the case for others. The user should be able to choose what behaviour they want.

The order in the patterns editor should only apply to the default patterns. If that is the order that should be applied to a command then one should choose to use the default patterns. The per-command selection and order of patterns should remain so that one has the option to use a command specific list and order instead of the global list and order.

The per-command list of patterns should remain. Especially for backwards compatibility and those people that use it in that way.

I tend to fix things when they get in the way, at least in smaller projects.
Is it ok for you to have changes in the PR, that don't really belong to it? Or should I try to split these into different PRs? some of them would be very small.

I think rather focus on the things that are needed for this feature. If there is a small issue or change you think would be beneficial then rather log a separate issue for it.

I know it can be annoying when there are little things that one just wants to take care of, but I'd prefer to keep PRs focused on one thing.

@hg42
Copy link
Author

hg42 commented Apr 2, 2017

I looked at the code for pattern loading and usage regarding the conversion to a list.

Assuming that all patterns are used for defaultPatterns and because you want to keep the command specific pattern lists with their own sequences, I think it would indeed be better what you suggested before: keeping current structures and simply use defaultPatterns for it's sequence.

If defaultPatterns does not exist it will be created out of the "patterns" map (the current behavior is to use a for loop over the keys and values to create the table view, so this would keep the current sequence). Editing the patterns would additionally save defaultPatterns.

When the map and the list don't match:

  • patterns not in the map would be deleted from defaultPatterns.
  • patterns not in the list would be added at the end

But, if you want to allow disabling some patterns in defaultPatterns it becomes a bit more complicated.
Because the sequence should be maintained over all, I see two possibilities:

  • either using a list of maps instead of the map of maps
  • or we need an additional sequence list to keep the sequence apart from defaultPatterns

@morassman
Copy link
Owner

Hmmm... I don't fully understand.

I think that, as part of the migration, the map should be converted to an array. The order in the array would then imply the order of the default patterns. Each pattern can then get a string field for its name and a boolean to indicate if it is a default pattern. This way one only needs to maintain the array and not multiple maps and lists.

Would this work?

@hg42
Copy link
Author

hg42 commented Apr 2, 2017

I am trying to use the table view more generally with options selectable, deletable, draggable
will look to re-implement command patterns with table view if possible, so dragging would be for free

@morassman
Copy link
Owner

Ok. I didn't realise you went with the idea to kept the definition of the patterns (the map) separate from default patterns (array).

@hg42
Copy link
Author

hg42 commented Apr 2, 2017

I think that, as part of the migration, the map should be converted to an array

that was the plan...

but -- the container used in the package has to be a map anyways, because the names of the command patterns are looked up. defaultPatterns needs to remain a list internally, because it's handled just like the command patterns.
So, internally it's still a map of data and a list for the sequence.
The list of objects in the file would have to be converted into a map of objects and a list of names.
And converted back when saving.

So as a conclusion, I think it's not worth migrating to a structure not representing the real situation.

Btw. I got tricked by coffeescript in my last commit, not key in keys is really (not key) in keys.
Also something more is wrong, defaultPatterns is not changed in the file...
I think I'm a bit distracted by a lot of children having fun (and sometimes wining) outside.

@hg42
Copy link
Author

hg42 commented Apr 4, 2017

I fixed a remaining bug in pattern detection, just the same I fixed for inline patterns some time ago.
Without, a non existing path would hide another path with a later pattern in front it. The cure is essentially the same as for inline patterns (reordering the loop), but more complicated.

@hg42
Copy link
Author

hg42 commented Apr 4, 2017

Working on the fix I got into messing with regexps again.
I am still not very happy with the current regexp solution.

I tried several regexp libraries available for node.js, to solve the index problem.
It's very sad how this design problem of the javascript RegExp module hides itself to everyone and how everyone thinks the javascript RegExp API would be sufficient.

For the record and my motivation to look into other regexp libraries:

I am refering to my comment above:
#36 (comment)
the item starting with "...javascript design flaw..."

To summarize the problem in short:
The simple one would be to split a line like abbc at the expression b only if preceded by a.
Usually you would group only b and use the index of the matched group to split the line into parts.
The code for this should be something like this (kind of pseudocode):

re = new RegExp("a(b)");
text = "abbc";
matches = text.match(re);
start = matches[1].index;
len = matches[1].length;
[pre, match, post] = splitStringbyIndexAndLength(text, start, len);

In our case we need pre and post to find other patterns in those strings.

But, javascript does not provide the index of the group but only the index of the whole match, as you can see here. When switching the regexp engines, you see that javascript is the only one that has n/a for group 1.

Unfortunately any prefixing expression is part of the whole match (even non-capturing groups) and thus you don't have any chance to find the matched group itself. Searching matches[1] as a string in the whole matched part matches[0] doesn't work either, because the group's result could be present multiple times in the whole match and the rest of the expression can select a specific one of them, while with the string search you have to decide which one you use. E.g. an expression ab(b) would select the second b, but in xyzabbcdef the match result would be abb and you would search a b in it and find the first one.

Using a better regexp library with named groups would simplify life for the user:

(?:a.*)(?<x>b)

allowing complex expressions defining what should surrounding the match or be somewhere else in the line, and simply marking the right one with the name (here x).

Having extended regexps with insignificant whitespace and comments would allow even more clarity. E.g. Path expressions like this one:

(?:\bfile +|\bat +)?['\"]?(path)['\"]?.*\bline +(line)"

could be written like this:

(?: # allowed words preceding path
  \bfile\s+ |
  \bat\s+
  )
['\"]?  # optional quotes
(path)
['\"]?  # optional quotes
.*       # anything
\bline\s+ # word preceding line
(line)

Then e.g. (path) would be replaced by the real regexp (like currently) but named:

(?<path>[-\w./\\]+)

which would allow extracting the path from the regexp result simply by result.path without the need to find out which of the groups is the path group.
In a good library you could have multiple alternative expressions with the same name and the matching one would be assigned to the result automatically.
E.g. this would work automatically:

(?<path>...):(?<line>...)|line (?<line>...) in (?<path>...)

note that some libraries don't allow multiple sub-expressions with the same name.

Named expressions would also allow to remove the "Path Regex" column. The user could simply use a group named path in his pattern instead of (path).

A good library also allows to include flags in the regexp, so (?i) would make the expression ignore the case.

The libraries I found are split in two categories:

  • a precompiler for javascript RegExp
    • all have the above problem, because the info from javascript RegExp ist lost forever and cannot be regenerated
    • very good over all: XRegExp
      • has named expressions and a way to use them
  • a native library with bindings to javascript
    • this would work if they have APIs beyond the javascript RegExp compatibility layer
    • pcre / pcre2
      • would be my first choice as it's C(C++-Library is very mature and I proved it by myself for many years now
      • unfortunately the npm modules I found are not installable because of compile errors
      • the authors seem to have given up on this
    • re2
      • a google library which is astonishingly limited (for google)
      • has linear performance characteristics (this is why they dropped backtracking completely)
      • has named groups but only with the P-syntax (?P<name>...), while PCRE offers all variants used by other libraries (?<name>...), (?'name'...) and (?P<name>...)
      • the node.js binding exactly imitates the javascript RegExp API and therefore does not solve my problem

So, before fixing the interface by releasing the package, I want to try these:

  • look into pcre
    • eventually I can find another module/binding that work
    • I could try to create a package for pcre myself
  • try to solve the problem by some other way and use something like XRegExp for the rest
    • I hoped the split function could solve the problem, but this also splits by the whole match and it's not possible to use a group for that
    • the replace function could eventually be used to replace a single group by a kind of index code representing the object stored somewhere else, may be something like this {{{path_1}}} or better with a really weird syntax, so the probability it could exist in the text is minimized.

@hg42
Copy link
Author

hg42 commented Apr 4, 2017

this thread describes the misery:
Get index of each capture in a JavaScript regex - Stack Overflow

current state:

  • fiddling with native code is out, because work is too big for this problem and it makes the package less portable (as we can see with pcre)
  • xregexp is out, too, because it cannot handle multiple expressions with the same name. Other features are not really worth it. x-flag would, but this could be simulated by removing whitespace and comments (with some exceptions, probably manageable)
  • replace doesn't change anything, because it doesn't work with groups

but there is some silver lining on the horizon:

  • there are libraries to solve the group index problem by wrapping all parts not grouped with groups, which then allows finding the location of each in the line (just like I do it now explicitly)
  • there are libraries that implement named groups on top of javascript RegExp
    So generally it's possible to do what I want.

now the only problem is to combine them => not trivial...

They are not designed to be used as a chain or can be easily chained automatically.
This is because they work on RegExp API and this does not allow to use the extended functionality, so they all have their own extension API.
What is lacking is something like jstransform or babel for regexps.

I am currently eliminating the false candidates.
Then I will investigate each combination whether it's possible to create a combined library of the pair.

@hg42
Copy link
Author

hg42 commented Apr 5, 2017

status update:

I have found a really nice library called regexp-tree.
It can parse a regexp and creates an AST from it. You have a traverse function to work recursively on the syntax tree and you can generate a new javascript-RegExp from the AST.
In a test I already implemented named groups and converting non-grouped sections into groups.
Now I am at a different viewpoint as before.
I do not have to think any more:
"what am I able to implement with javascript RegExp?"
Instead I find myself on the other side and can now (and have to) think:
"what do I really want it to do?".
Feels good...

My current plan:

  • non-grouped sections and non capturing groups should be output as simple text
  • grouped sections should be styled with the rule name as class name (<rule>)
  • nested named groups should get a class name <rule>-<group>
  • no more different handling of line, inline and path expressions when parsing the line
  • differences are handled at RegExp creation and when generating output
  • (path) and (line) (and (column)?) should be handled as named groups and use class names <rule>-path and path and <rule>-line and line. When including a named group "path" (which is also generated by (path)) the whole expression should be converted to a capture group and have a class name <rule>.

I currently have each part of the match in an array of strings and objects for named and numbered groups. Nested groups come after the enclosing group.
So, I still have to find a way to handle those nested groups when splitting the line into parts.
Nesting patterns is to be solved, too.
But I'm sure I find solutions to these in the next days.

@morassman
Copy link
Owner

That sound promising. I also came across XRegExp when I added patterns, but it didn't do what I was looking for either.

You remember that quote I made of solving things with regular expressions? What made me think of it was when I realised that what you're bordering on is the need for a proper parser. Regular expressions are powerful, but has their limits.

I hope regexp-tree allows you to do what you need with regular expressions. We should try to avoid getting to the point where a language specification for each pattern would be better suited than a regex :)

@hg42
Copy link
Author

hg42 commented Apr 5, 2017

basically the same what I do now but without javascript problems :-)

regexp-tree is only used to transform an extended regexp into javascript regexp.
This makes patterns more as expected and thus more user friendly.
E.g. you can use the natural "(path)" pattern for a quoted path instead of (?:.*")(path)(?:".*) how it is currently necessary, because of the mentioned problem of javascript RegExp.

@hg42
Copy link
Author

hg42 commented Apr 15, 2017

I just want to say, I am not dead :-)
I contributed a bit to regexp-tree to solve some probs I had when using it.
Now I slowly get back on track.

@hg42
Copy link
Author

hg42 commented Apr 15, 2017

Note, the above example ("(path)") is very trivial, because it doesn't matter if the quotes are included in styling.
The original use-case was from my tracing outputs:

   \ func ( x=1 y=2 z=3 )
     ...
   / func ( x=1 y=2 z=3 ) = "some result of func"

where I wanted to style only the function result = "some result of func" but only in trace lines starting with \s*/\s+ (= end of function).
It should look something like this:

/ func ( x=1 y=2 z=3 ) = "some result of func"

This was impossible with javascript regexps, because I cannot get the character index (position in line) of the group.

I had to use a trick, which is visible to the user, using three groups instead of one and determining the position of the second group by looking at the length of the first group. This is not intuitive and the user can easily create a non-working configuration.

Now with regexp-tree, I can parse the regexp configured by the user, wrap all non-grouped parts in groups and calculate the positions in the match result by cumulating the lengths of each detected group result.
Example:
I want to detect this:

abc.*?(\d+).*xyz

which is a number preceded somewhere by "abc" and followed somewhere by a "xyz".
Only the number should be styled.
Say we have some string "foo abc test 123 test xyz bar".
The coffee-script:

  text = "foo abc test 123 test xyz bar"
  re = /abc.*?(\d+).*xyz/
  console.log text.match re

returns:

[ 'abc test 123 test xyz',
  '123',
  index: 4,
  input: 'foo abc test 123 test xyz bar' ]

So, you only get the position of "a" in the line (index: 4) and the whole matched string and the group ('123').

With this result, you have no way to determine the position of the number.
So you can only style this way:

foo abc test 123 test xyz bar

but it should look like this:

foo abc test 123 test xyz bar

With regexp-tree I can parse the regexp convert it to:

(abc.*?)(\d+)(.*xyz)

and remember the group numbers of the original groups.
With

  text = "foo abc test 123 test xyz bar"
  re = /(abc.*?)(\d+)(.*xyz)/
  console.log text.match re

I get:

[ 'abc test 123 test xyz',
  'abc test ',
  '123',
  ' test xyz',
  index: 4,
  input: 'foo abc test 123 test xyz bar' ]

So I can split the pre and post parts of the result as usual ("foo ", " bar") and I can use the length of the first group to determine the position of the number (pos_number = index + match[1].length).

Generally all grouped elements should be styled and the surrounding strings should not.
If they are nested, the inner groups are ignored. The same for non-capturing groups.
So:

abc.*(\d+).*klmn.*(<(\d+|\w)>).*xyz

should style the first number and the second number or word including < and >.
But "abc" or "klmn" or "xyz" or all other parts are left unstyled.

@morassman I also want to replace the regexp for paths etc. with named groups:

  getPathExpression: (hasLine) ->
    if os.platform == "win32"
      return "(?:[a-z]:\\\\|\\\\)?[\\w\\.\\-\\\\]+[.\\\\][\\w\\.\\-\\\\]+";
    return "(?:~\\/|\\/)?[\\w\\.\\-\\/]+[.\\/][\\w\\.\\-\\/]+";

would then look like this:

  getPathExpression: (hasLine) ->
    if os.platform == "win32"
      return "(?<path>(?:[a-z]:\\\\|\\\\)?[\\w\\.\\-\\\\]+[.\\\\][\\w\\.\\-\\\\]+)";
    return "(?<path>(?:~\\/|\\/)?[\\w\\.\\-\\/]+[.\\/][\\w\\.\\-\\/]+)";

advantages:

  • we can easily extract the path from the result by match.groups.path and the line by match.groups.line without storing an index for them.
  • the user can have his own path expression by simple using a (?<path>...) construct instead of (path) and this would work automatically. The same would work for line (but there's no advantage).
  • this also allows to remove the third column "Path RegEx".

Is this ok for you?

@hg42
Copy link
Author

hg42 commented Apr 15, 2017

btw. I am not sure how to handle my new regexp module (codename XRE).
I think, usually it would be published as npm module.
But regexp-tree is very much in flux now, and may be my library changes it's shape dramatically.
I'm also not very sure about the current interface (how the result is returned).
Because of this, I think it should stay hidden for now.

On the other hand, the code for process-palette can be left simple, while I would want the module to get some more features in future.

I am currently planning to include the XRE code in pattern/xre.js and require it by pattern/xre-pattern.coffee.

With XRE I have more possibilities to split the line into parts.
I have ideas how to redesign the pattern matching loops so it is simplified.
Handling of nested patterns could be a side effect of this.

@hg42
Copy link
Author

hg42 commented Apr 26, 2017

just a quick update...

I am near to releasing my regexp module (code name XRE above) to the public, because I think it is mature enough now.

But I am waiting for a pull request to be accepted for the underlying regexp parser to fix a bug.
Also, I still have to find a good name for it.

I have already integrated this regexp module into process-palette with x-flag and named groups for path and line as described above.

No algorithmic changes in the three processing loops for now.
May be later when I find a use-case (or someone else).

I'll push the new changes for testing, when the pull request has been accepted and I have released the module.

@hg42
Copy link
Author

hg42 commented Apr 27, 2017

I published the npm module: uxregexp

I merged all my local commits to process-palette into a single commit on the pull request and cleaned some differences in my working tree. I hope there's nothing missing.

There is an "interesting" conflict between the ansi colors and patterns:
If the pattern matches a part of the ansi sequence, the sequence is split and doesn't work any more. So you cannot use a pattern that for example styles a number, because numbers are part of the ansi sequence. Unfortunately we cannot convert the ansi sequence first, because then it's html result would be escaped afterwards.

Probably, the solution would be to move the ansi conversion directly in front of the pattern conversion.

I am currently using atom 1.17.0-beta4. However, almost all changes were implemented and tested with 1.16.0.
Only the work on publishing the npm module was done with the beta.
I hope that did not introduce any error (because of different node versions etc.).

@hg42
Copy link
Author

hg42 commented Apr 29, 2017

I had to fix an error in uxregexp...

@morassman
Copy link
Owner

I wouldn't be too worried about the conflict with ansi colours. If people use the patterns for styling the way your code intends then the pattern should determine the colour and not the ansi codes. I may be wrong, but from my experience not many command line tools produce ansi codes, or at least not the ones I've been using.

Also, it may even be that the average user won't use the patterns in the first place, so it might be a bit of a corner case if someone experiences a problem.

@hg42
Copy link
Author

hg42 commented Apr 30, 2017

yes, I don't worry much about that, because there are not many chances they conflict.

But I ran into it myself:
I thought, having numbers highlighted would be a nice thing (you see error counts etc. more quickly).
Because the ansi codes are split by the spans, you can get weird results depending on what remains active. In my case the test failed, the colors were absent and I didn't get a clue why this was happening until I inspected the lines in dev tools. Perhaps we should warn abut it in the docs.

@hg42
Copy link
Author

hg42 commented Apr 30, 2017

I wonder if we should provide default patterns, so the average user can take advantage of the feature without configuring anything.
I loved this, when I started to use SciTE many years ago and since then coloured all my outputs by starting lines with ">", "--" or "!" etc. I even tried new sequences until I found all hard-coded patterns.
Many patterns are natural. E.g. ">" is also used by process-palette.
Also, pre-installed patterns can show how they can be used and the user will start to try own patterns.
Learning by example.

@morassman
Copy link
Owner

Process palette has an example configuration file built in. It's very easy to get access to it. Any configuration examples such as these should go into that file.

@hg42 hg42 mentioned this pull request Jul 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants