Improve repl output window performance #2011

julienvincent · 2023-01-08T22:26:44Z

This PR introduces two new config entries - replOutputThrottleRate and replOutputMaxLines which are both used to improve the performance of the repl output window/file.

`replOutputThrottleRate`

If set to a non-0 number this will throttle output from the repl connection. If more output items are received than the throttle rate in a 500ms window then they will just be dropped.

`replOutputMaxLines`

If set to a non-0 number this will cause the output window to be truncated if it grows
beyond this threshold.

This should fix #2010 and #804 and should help address #942 (or maybe fix it?)

When an excessive amount of output is produced by the repl (for example as a result of a rogue loop that is printing to stdout) Calva can sometimes hang while trying to write all the output to the output window/file. The only way to resolve is to restart/reload the VSCode window. This commit introduces a new config entry `replOutputThrottleRate` which when set to a non-0 number will throttle output from the repl connection. If more output items are received than the throttle rate in a 500ms window then they will just be dropped. Addresses BetterThanTomorrow#942 Fixes BetterThanTomorrow#2010

Calvas performance can be drastically affected as the size of the repl output window grows. This commit adds a config entry `replOutputMaxLines` which if set to a non-0 number will cause the output window to be truncated if it grows beyond this threshold. Fixes BetterThanTomorrow#804

PEZ · 2023-01-10T07:26:33Z

package.json

+          "calva.replOutputMaxLines": {
+            "markdownDescription": "The maximum number of lines to retain in the repl output window. Having the repl output window grow too large can significantly affect performance. Setting this to 0 will disable truncating",
+            "type": "number",
+            "default": 1000
+          },


It's generally not the number of lines that is the problem, rather the number of tokens on the top level, or in a sexpr. A way to count tokens on the top level is to count words when printing output. That will introduce state that can be tricky to manage, though...

1000 lines is very low. It could be some 50 evaluations of not too big structures. This would be completely non-problematic for Calva to handle. While printing 1000 lines of output with 100 words each might start to be problematic (I actually don't know where the problems start.)

It's generally not the number of lines that is the problem, rather the number of tokens on the top level, or in a sexpr

While this makes sense, I don't think the additional complexity needed to implement this is really worth it. My assumption is that counting lines is a close-enough analogy in the average case for this to probably be sufficient. Especially if we make sure the threshold is high enough for the user to not care about the truncation.

1000 lines is very low. It could be some 50 evaluations of not too big structures

This value was chosen pretty arbitrary, I'd be happy to increase it to a degree. While working on this I started to see noticeable performance degradation after about 10k lines of stack-trace content (I was printing Exceptions to stdout) so I don't think it should be increased too much. Maybe 5k lines?

I also think it's unlikely that a user would really be scrolling that far back in the repl history too often for us to need to worry about truncating after a few thousand lines.

Unlikely, maybe, but I know of a user who does this all the time. 😄 I search the file rather than scroll, but anyway. (I keep tons of browser tabs open as well.) To me it makes sense to disable this limit feature (by setting limit to 0, e.g.) and to have that as default.

Hey @julienvincent, thanks for taking time to look into this.

I think deleting at an arbitrary line length will cause unbalanced forms and break syntax highlighting and maybe also formatting, unless I'm missing something. If I'm not though, I don't think that's really an option.

To me it makes sense to disable this limit feature (by setting limit to 0, e.g.) and to have that as default.

I'm happy with that too.

I think deleting at an arbitrary line length will cause unbalanced forms and break syntax highlighting ...

It does break highlighting but in a localised way (only the broken form will be colored incorrectly). The rest of the file maintains the correct highlighting. I think this is an probably an acceptable behaviour.

and maybe also formatting

Which formatting do you mean? I haven't noticed anything obviously wrong.

Here can see what I mean

The way the structural editor works, when there is broken balance in the document, all bets are off. It will sometimes seem to work (or even work), but in many cases the structural editor is broken and with that a lot of Calva functionality that depends on this.

If we limit on lines, it will have to do it such that the balance is never broken. There are some ways to do this, I can think of two right now:

Add the missing brackets. There is code for that in Calva, even if it might be a bit tricky to re-use (or, messy code, tbh).

Always print full results. The limit then need to happen before the buffering, I think. (Not that I know how it is done now, but anyway.)

That said, I think we could also consider popping up a warning when the line limit is reached, instead of enforcing the limit. The warning can have a button for clearing the window.

It does break highlighting but in a localised way (only the broken form will be colored incorrectly). The rest of the file maintains the correct highlighting. I think this is an probably an acceptable behaviour.

Yeah, I'd say that, in particular, is acceptable too.

By formatting I meant if someone is typing code in the repl window, the auto-formatting could/would be broken if there's an unbalanced form (which Peter mentioned above - structural editing). And as he said other functionality may be broken too in that case.

CHANGELOG.md

PEZ · 2023-01-10T07:54:45Z

Thanks for caring about this issue!

I'm not sure about the solution:

Generally we try to challenge the need to for adding configuration. There might be some other ways to tackle this that don't introduce configuration, or that introduces configuration that is easier to understand for the user. I think I posted an issue at some point, suggesting that we move the stdout and stderr output to another place, like an output channel, where we don't need to support structural editing (which doesn't make super much sense for print output anyway). That would leave mostly evaluation results in the REPL/output window and there is already configuration that can limit the length of these.

Seems reasonable throttling the pace of printing. And we might need/want to do that even if we print stdout/stderr somewhere else. I have no clue about what would be a good default though. Throttling too eagerly will disrupt things too as missing output can be very confusing.

Printing together with the evaluation makes sense for many use cases, so we might want to have config for this if we go the print-somewhere-else route.

We might want to collect more information about the problem. What if we added counters to the documents about how many tokens they have and how long it takes to scan them and to iterate through them? We could use the token counter for limitation settings instead of line count. And we could store this information in an edn file or something under the .calva supporting a Calva diagnostics command. At least for the output window, which is a bit special.

Co-authored-by: Peter Strömberg <pez@pezius.com>

julienvincent · 2023-01-10T10:28:25Z

I think I posted an issue at some point, suggesting that we move the stdout and stderr output to another place, like an output channel

Personally I really like that all my output lives in one place and I also much prefer the user-experience of it just being a document/editor window over the vscode native output channels. If we wanted to add support for outputting to a vscode channel I would ask for making that configurable so a user can choose to keep it in the repl/output window.

Seems reasonable throttling the pace of printing. And we might need/want to do that even if we print stdout/stderr somewhere else. I have no clue about what would be a good default though

My guess is that while outputting to a vscode channel might be more performant than the output window, we would probably still want to throttle yes.

I have no clue about what would be a good default though

Setting this to something quite high, like 2k, as a default starting point is probably quite safe. I think it would be quite rare that people actually run into this - it's mostly intended as a safety guard against rogue code, which I think people would struggle to trigger under normal development.

Throttling too eagerly will disrupt things too as missing output can be very confusing

It's worth highlighting that throttled/dropped items are reported in the output window which should help to reduce confusion. Perhaps when reporting we should also show a help message for how to increase/disable the throttling?

We could use the token counter for limitation settings instead of line count

Do you mean as a way to drive the decision for setting a default or are you proposing some kind of mechanism for Calva to automatically decide on a value?

PEZ · 2023-01-10T11:44:09Z

We could use the token counter for limitation settings instead of line count

Do you mean as a way to drive the decision for setting a default or are you proposing some kind of mechanism for Calva to automatically decide on a value?

I was thinking that if the document knows how many top level forms it has, it will be as easy to limit the output on this as on line counts.

Then we could consider measuring two things on each document:

Time it took to scan it (the lexer work)
Time it took to traverse it, I think we go through the whole document with forwardSexp() at some points and we could record the time

If we store this info on the document as well, it could also be used for limiting the output window output. Then the mechanism could be that we inform the user when the output window is getting slow, with the instruction to clear the window, rather than having an automatic limit.

bpringe · 2023-01-11T04:54:38Z

If we wanted to add support for outputting to a vscode channel I would ask for making that configurable so a user can choose to keep it in the repl/output window.

I believe this was suggested / the idea, were that to be implemented.

My guess is that while outputting to a vscode channel might be more performant than the output window, we would probably still want to throttle yes.

I doubt throttling is necessary in output channels at all, but of course we lose the benefits of it going to an editor, and this is a bit of a tangent - I just wanted to comment.

I wonder if spending much more time and effort on the current output window is worth it. I've been thinking lately that Calva might be better off without the current output window. In its place we could use Portal or (to not add such a dependency) some read-only display provided by VS Code that supports syntax highlighting.

@PEZ I know the term "webview" might stir things inside you from past Calva days 😄, but I think a read-only webview would be drastically simpler than when we had another paredit implementation in a webview. In a webview we could use an existing JS solution for syntax highlighting. Maybe we could also use markdown instead of or within a webview? I don't know if that would be a good idea or not.

Anyway, I've just been thinking we can remove a lot of complexity and get rid of some of these issues by doing something like the above.

julienvincent changed the title ~~Feature/repl output performance~~ Improve repl output performance Jan 8, 2023

julienvincent changed the title ~~Improve repl output performance~~ Improve repl output window performance Jan 8, 2023

julienvincent added 2 commits January 8, 2023 22:44

julienvincent force-pushed the feature/repl-output-performance branch from 6f95b38 to 99b6e9d Compare January 8, 2023 22:44

PEZ reviewed Jan 10, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Fix incorrect changelog entry title

eaa5493

Co-authored-by: Peter Strömberg <pez@pezius.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve repl output window performance #2011

Improve repl output window performance #2011

julienvincent commented Jan 8, 2023 •

edited

PEZ Jan 10, 2023

julienvincent Jan 10, 2023

PEZ Jan 10, 2023

bpringe Jan 11, 2023 •

edited

julienvincent Jan 12, 2023

julienvincent Jan 12, 2023

PEZ Jan 12, 2023

bpringe Jan 13, 2023 •

edited

PEZ commented Jan 10, 2023

julienvincent commented Jan 10, 2023 •

edited

PEZ commented Jan 10, 2023

bpringe commented Jan 11, 2023

Improve repl output window performance #2011

Are you sure you want to change the base?

Improve repl output window performance #2011

Conversation

julienvincent commented Jan 8, 2023 • edited

replOutputThrottleRate

replOutputMaxLines

PEZ Jan 10, 2023

Choose a reason for hiding this comment

julienvincent Jan 10, 2023

Choose a reason for hiding this comment

PEZ Jan 10, 2023

Choose a reason for hiding this comment

bpringe Jan 11, 2023 • edited

Choose a reason for hiding this comment

julienvincent Jan 12, 2023

Choose a reason for hiding this comment

julienvincent Jan 12, 2023

Choose a reason for hiding this comment

PEZ Jan 12, 2023

Choose a reason for hiding this comment

bpringe Jan 13, 2023 • edited

Choose a reason for hiding this comment

PEZ commented Jan 10, 2023

julienvincent commented Jan 10, 2023 • edited

PEZ commented Jan 10, 2023

bpringe commented Jan 11, 2023

julienvincent commented Jan 8, 2023 •

edited

`replOutputThrottleRate`

`replOutputMaxLines`

bpringe Jan 11, 2023 •

edited

bpringe Jan 13, 2023 •

edited

julienvincent commented Jan 10, 2023 •

edited