Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pager: fix printing non-visible characters \001 & \002 #2207

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kyrylo
Copy link
Member

@kyrylo kyrylo commented Jul 3, 2021

Fixes #2185 (Non-Visible Characters Printed with Page)

The --raw-control-chars option (aka -r) does the following:

Causes "raw" control characters to be displayed.  The default is
to  display  control  characters  using  the caret notation; for
example, a control-A (octal 001) is displayed as "^A".  Warning:
when the -r option is used, less cannot keep track of the actual
appearance of the screen (since this depends on how  the  screen
responds to each type of control character).  Thus, various dis-
play problems may result, such as long lines being split in  the
wrong place.

The --RAW-CONTROL-CHARS option was always used with the pager, I just expanded
it from -R, so that it looks more descriptive. For reference, it does the
following:

Like  -r,  but  only ANSI "color" escape sequences are output in
"raw" form.  Unlike -r, the screen appearance is maintained cor-
rectly  in  most  cases.   ANSI  "color"  escape  sequences  are
sequences of the form:

     ESC [ ... m

where the "..." is zero or more color  specification  characters
For  the  purpose  of  keeping  track of screen appearance, ANSI
color escape sequences are assumed to not move the cursor.   You
can  make less think that characters other than "m" can end ANSI
color escape  sequences  by  setting  the  environment  variable
LESSANSIENDCHARS to the list of characters which can end a color
escape sequence.  And you can make less  think  that  characters
other  than the standard ones may appear between the ESC and the
m by setting the environment variable  LESSANSIMIDCHARS  to  the
list of characters which can appear.

@SilverPhoenix99
Copy link
Contributor

This would only work for people using less. There might be systems where it's not available, or someone might want to use an alternative pager, so I don't believe it's addressing the core issue.

I believe a better option would be to manually opt-out of using the escapes through a configuration.

@SilverPhoenix99
Copy link
Contributor

I've created PR #2209 on top of this one, that makes header escapes optional, that should help people who aren't using terminals and pagers that support those escapes.

@kyrylo
Copy link
Member Author

kyrylo commented Jul 3, 2021

When less is not available, then Pry uses its own pager (SimplePager). From what I understand the reported issue is relevant only to those who use less. What are other known pagers and which ones have the same issue?

@SilverPhoenix99
Copy link
Contributor

SilverPhoenix99 commented Jul 3, 2021

Ah, sorry. I think we're talking about separate issues caused by the same thing.

In both following cases, I'm using less in Windows as the pager.

When Pry uses the pager, because it's filling more than 1 page, I get this result:
image

When it doesn't (i.e., the output doesn't fill a whole page, so it doesn't use the pager), I get this:
image

Essentially, it's the same issue, but from different perspectives.

Granted, it's not the best option when using a combination of terminal and pager where only one of them needs the escape codes to work consistently, but in #2209 I was trying to give people a workaround for when their terminal and/or pager doesn't work well with the escape codes.

Oddly enough, I don't get this issue when printing values, so something's working differently there.
Since it's going through CodeRay, I don't get this issue when printing values.
In Msys:
image

In Powershell 7 (input is not highlighted):
image

@kyrylo
Copy link
Member Author

kyrylo commented Jul 3, 2021

Are your screenshots from the current release or this PR? Sorry, it's not clear.

@SilverPhoenix99
Copy link
Contributor

Those were taken from the current release (0.14.1).

If I use this PR, I only get those smiley faces instead of ^A/^B, like in the 2nd picture, regardless of using a pager or not.

@SilverPhoenix99
Copy link
Contributor

I was looking more into less, and it looks like --raw-control-chars and --RAW-CONTROL-CHARS are exclusive, and the last option "wins".

I've managed to test this with less v551 in 2 separate Ubuntu and tested these:

echo -e "\001\e[31m\002foo\001\e[0m\002" | less -R -r

image

echo -e "\001\e[31m\002foo\001\e[0m\002" | less -r -R

image

Reading the most recent less documentation, I couldn't find a way to remove/hide C0 control codes (ASCII <= 0x), only how the escape codes (\e only) present themselves, as either colored output or escape sequences (ESC[0m).

@SilverPhoenix99
Copy link
Contributor

I believe I understand the problem a bit better now.

The issue is that the \001/\002 codes are only used with readline, and they should only be added in the prompt, which is passed to readline. When used anywhere else, it'll cause those characters to be visible.

I think there are 3 options here, from thoughest to simplest:

  1. the escapes should be removed before writing to Pry.output;
  2. everything should go through readline, i.e., the output is sent back through the prompt;
  3. there are no escapes by default, and they're automatically added to prompts (provided a new config Pry.config.escape_prompt = true).

I've attempted option 2, and although it didn't display \001/\002, it was too slow, and there were some visual artifacts.
image

I assume option 1 would be overkill, since we're adding the controls everywhere and then every time we write to the output we have to strip them.

I went with option 3 because if this, since it was a simple case of adding the escapes in a single method (Pry::REPL#read).

I believe PR #2209 can be used in favour of this one, and this PR can be closed.

Fixes #2185 (Non-Visible Characters Printed with Page)

The `--raw-control-chars` option (aka `-r`) does the following:

```
Causes "raw" control characters to be displayed.  The default is
to  display  control  characters  using  the caret notation; for
example, a control-A (octal 001) is displayed as "^A".  Warning:
when the -r option is used, less cannot keep track of the actual
appearance of the screen (since this depends on how  the  screen
responds to each type of control character).  Thus, various dis-
play problems may result, such as long lines being split in  the
wrong place.
```

The `--RAW-CONTROL-CHARS` option was always used with the pager, I just expanded
it from `-R`, so that it looks more descriptive. For reference, it does the
following:

```
Like  -r,  but  only ANSI "color" escape sequences are output in
"raw" form.  Unlike -r, the screen appearance is maintained cor-
rectly  in  most  cases.   ANSI  "color"  escape  sequences  are
sequences of the form:

     ESC [ ... m

where the "..." is zero or more color  specification  characters
For  the  purpose  of  keeping  track of screen appearance, ANSI
color escape sequences are assumed to not move the cursor.   You
can  make less think that characters other than "m" can end ANSI
color escape  sequences  by  setting  the  environment  variable
LESSANSIENDCHARS to the list of characters which can end a color
escape sequence.  And you can make less  think  that  characters
other  than the standard ones may appear between the ESC and the
m by setting the environment variable  LESSANSIMIDCHARS  to  the
list of characters which can appear.
```
@smook1980
Copy link

I was digging into this tonight and it seems that either less needs to be called with -r to process the control codes \001 and \002 or those remove those controls codes. I'm not sure of the purpose of the start of heading and start of text control characters. From what I gather online start of heading was intended to mark the non-data portion of a stream for the purpose of flow control. @SilverPhoenix99 do you know the purpose of using \001 and \002 with readline? I wasn't able to find a refer to either in the dos. I'd be happy to contribute towards a fix if someone can guide me on which approach is acceptable here.

In the meantime I'm using this bash script to work around this issue. Maybe others will find it useful.

#! /usr/bin/env bash

if ps -p $PPID -o comm= | grep --quiet -i -E '(pry|ruby)'; then
  exec /usr/local/bin/less $@ -r;
else
  exec /usr/local/bin/less $@;
fi

@SilverPhoenix99
Copy link
Contributor

@smook1980 it's been a long while since I've looked into this, but if I recall correctly, \001 and \002 are just used by Readline as markers around invisible characters. I believe it's to exclude those invisible characters to calculate the correct length of prompt strings as they show in the terminal and position the cursor correctly.

I don't recall in detail when it happens, but reading my past comments, I think it's when strings are printed to the prompt when not using Readline nor a pager, which happens when doing something like ls Object, and it doesn't fill the whole terminal.

@iloveitaly
Copy link

This fixes pager output from my help command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-Visible Characters Printed with Pager
4 participants