-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tab width support #429
Tab width support #429
Conversation
/// | ||
/// assert_eq!(display_width_with_tab("Café \t Plain", 4), 15); | ||
/// ``` | ||
pub fn display_width_with_tab(text: &str, tab_width: u8) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one thing that I don't like too much, but I could not find a way around without changing the public API. Having two methods, one just calling the other with a default parameter, doesn't feel very good, but I can't think of how else to implement this functionality.
|
||
// We have to set the tab width for each of the words. So as to not | ||
// change the public API (fn find_words), we do it here. | ||
let words = options.word_separator.find_words(line).map(|mut w| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would definitely be more ideal if find_words could perhaps be implemented in Options or something like that, so that we wouldn't have to .map the result (and risk re-allocations? To be honest, I don't know exactly if this would cause more allocations or what the memory usage effect of this would be), but I'd like to hear thoughts on this and if this should be changed to achieve the same result through a different method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would definitely be more ideal if find_words could perhaps be implemented in
Options
or something like that
I would like to keep Options
a mostly dumb container of, well, options. It is supposed to be the single place where library users configure the behavior of the top-level functions such as wrap
and fill
.
Hey Ian, thanks a lot, I'm glad you got it working 😄 I've only had a quick look and I agree with you that it looks a bit strange that the Let me go through and see if I can add some comments. |
Perhaps I'm blind, but I don't see new tests for The new tests should probably be at both the |
That is fine, I guess you can pass in |
I tried running the wrapping benchmark on your branch, and it now takes ~20% longer to wrap the text:
This is a bit much for such a "small" feature. Do you see similar results on your machine? |
Word { | ||
word: trimmed, | ||
width: display_width(trimmed), | ||
width: display_width_with_tab(trimmed, tab_width), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be any '\t'
characters in trimmed
, right? So we should be able to measure with the normal display_width
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very fair, thanks for pointing that out - will fix.
@@ -271,27 +304,27 @@ impl<'a> Word<'a> { | |||
continue; | |||
} | |||
|
|||
if width > 0 && width + ch_width(ch) > line_width { | |||
if width > 0 && width + ch_width(ch, self.tab_width) > line_width { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, we're stepping through the characters in self.word
, so we should not expect to find any '\t'
characters, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we shouldn't - but I think that, unless we want to add an additional, like, ch_width_without_tabs
method to calculate the width of a character while assuming it's not a tab, there's no reason to not pass in self.tab_width
, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you're correct. It looks like it won't be necessary to know the tab width when constructing a Word
.
@@ -308,11 +341,11 @@ impl Fragment for Word<'_> { | |||
self.width as f64 | |||
} | |||
|
|||
// We assume the whitespace consist of ' ' only. This allows us to | |||
// compute the display width in constant time. | |||
// Whitespace no longer consists of only spaces, so we have to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment refers to the "past" and it makes sense when reading the diff in this PR. However, it won't make sense when reading the code in six months :-)
I am seeing some implementation flaws in this, and have some thoughts that I'd like to get your opinion on. So, I initially added the I saw what you said about not including a I think I'll start implementing this, push some changes, and let you know when to re-review it :) |
Yeah, I think that would simplify things. Perhaps it can be simplified even further by doing something like this for line in text.split('\n') {
let words = options.word_separator.find_words(line).map(|word| {
word.whitespace_width += word.whitespace.count('\t') * options.tab_width;
word
});
let split_words = word_splitters::split_words(words, &options.word_splitter); Hmm, it seems there isn't a In any case, I think we can find the width of the whitespace in an efficient way without having to re-measure things more than necessary. This brings me to another point which we should consider: how does this impact what a "word separator" is? Until now, "words" are anything between one or more space characters. The I just added a little example program in #430 which allows us to see in detail what words the library finds. It would be great if you could rebase your branch on top of Without this PR,
After this PR,
This seems inconsistent and I'm not actually sure what the new "rules" are 😄 The |
You make a good point there (that Should we perhaps make two Thanks for pointing out that words are now split differently, though; I'd like to keep that the same as before. I just think this needs more discussion before I try to rewrite it again. |
Hi Ian,
You mean because we need to refer to let words = options.word_separator.find_words(line);
let split_words = word_splitters::split_words(words, &options.word_splitter); we have the words broken down into syllables: so a word like So I think we would need to
This way, the handling of the
Having two structs for this is tricky in its own way: The |
Hello! Sorry to bug you @iandwelker, but are you still planning on working on this, or would you like someone else to take it from here? The current lack of tab support is blocking me from fixing helix-editor/helix#3622, so I (or maybe another helix contributor) might possibly be interested in continuing work on this. |
Hey @mtoohey31, I think it would be great if you would jump in and work on this! Looking over helix-editor/helix#3622, I think a first step would be to let modified src/lib.rs
@@ -653,7 +653,7 @@ where
pub fn unfill(text: &str) -> (String, Options<'_>) {
let line_ending_pat: &[_] = &['\r', '\n'];
let trimmed = text.trim_end_matches(line_ending_pat);
- let prefix_chars: &[_] = &[' ', '-', '+', '*', '>', '#', '/'];
+ let prefix_chars: &[_] = &[' ', '\t', '-', '+', '*', '>', '#', '/'];
let mut options = Options::new(0);
for (idx, line) in trimmed.lines().enumerate() {
@@ -1814,6 +1814,14 @@ mod tests {
assert_eq!(options.initial_indent, " ");
}
+ #[test]
+ fn unfill_tab_indent() {
+ let (text, options) = unfill("\t\tfoo\n\t\tbar\n\t\tbaz\n");
+ assert_eq!(text, "foo bar baz\n");
+ assert_eq!(options.width, 3);
+ assert_eq!(options.initial_indent, "\t\t");
+ }
+
#[test]
fn unfill_differing_indents() {
let (text, options) = unfill(" foo\n bar\n baz"); The If you only want to support tab-indentation (to start with), then you could work around this by counting If you want to wrap at 80 columns, then you would do something like this: let width = 80;
let tab_width = 4;
let (text, mut options) = unfill(paragraph);
options.width = width - tab_width * options.initial_indent.bytes().filter(|ch| *ch == b'\t').count();
let wrapped = fill(text, You would need to decide what you want to happen if A more correct fix would be to make |
Hey! Sorry I haven't been responding or working on this, my need for it has significantly decreased in the past months so it sort of got put on the backburner in my mind. If someone else would like to take a shot at it, feel free to work on top of this PR or make a whole new PR for it; I don't think I'll be putting any time into solving this issue anytime soon, so you won't be stepping on my toes. |
No worries @iandwelker, and thanks for the work you've done on this so far! I'll try and make progress on this as soon as I find time. And thanks @mgeisler, I'll start with
That sounds like it might be a good compromise; it would handle all the situations that I've found annyoing with helix. I'll probably still see what I can come up with in terms of a more complete solution, but if it doesn't work out, then I'll go with indentation only. |
I think it would be nice to make |
So I've spent some time reading through the discussion here, looking at the existing code, and doing some of my own tinkering. I'm less enthusiastic now about the option of only considering tab indentation, because when As discussed though, if we want to fix this properly, there's quite a few functions that we'll need to pass the tab width down through. Here are some of the potential options:
When you have time, could you let me know what you think @mgeisler, particularly about option 3? |
Hi @mtoohey31, thanks for pushing on this feature! In general, the public APIs of Thanks for writing up the three options, that makes it easier to discuss. I know the crate has a bunch of Cargo feature flags already, but I would actually prefer to have less of them overall. That is mostly because they become and extra layer to reason about: an I think the changes in this PR are pretty good, but there are some remaining questions:
I would suggest that a good next step could be to clean up the PR: rebase it and remove all unrelated changes. Make it simple and let's see what that costs in terms of performance. Does that sound reasonable? |
It's definitely fair that adding another feature will make things more complicated. Now that I've thought about it again, I agree that benchmarking the impact of the simpler approach should be the next step. Right now I'm just guessing about what the performance might be like, which is never a good idea. So yes, the next steps you've proposed sound good to me. I'll let you know when I've simplified the changes and have benchmarks for us to take a look at. |
Thanks a lot! As for how expensive this "should", it's hard to say... I have the impression that Textwrap is more than fast enough right now, as in nobody has ever complained about performance. So I would say that code clarity should be our first priority 🙂 This will be the first time that the runtime |
Ok, I finally got around to starting on this. I have a rebased version with the tests passing, and some things (though not everything) simplified here, and there's another branch that just contains the benchmark for comparison here. Benchmark ResultsThe differences seem to be within margin of error. At commit 4663a6e on
And on
So I think that's good news? Two test related notes/questions:
Outstanding Implementation Questions
|
Yay, thank you for this! I just looked at the changes in your branch and it looks good. Btw, I've just done some autumn-cleaning and moved the top-level functions out of
I've seen them too once in a while! Don't worry about them, Criterion is supposed to handle it for us. In my experience, the benchmark numbers can fluctuate up and down by 5-10%, even without any code changes. I'm getting more stable results on a desktop vs a laptop. I mostly use the numbers to try and see if a change suddenly and unexpectedly makes things much slower (and to verify that things are O(n) when we expect them to be).
Thanks for benchmarking it! You could perhaps change the benchmark to call the already public
Yeah, I'm happy to release this as part of 0.17 release.
Yeah, that sounds good. I just tested with https://mgeisler.github.io/textwrap/ and this shows that copy-pasting a TAB into the text creates a word boundary when using You could consider making this change as a follow-up: leave word separation as-is for now and only introduce the plumbing needed to pass down the
Cool. My thinking was simply that the value will be constant for all words and thus it seems unnecessary to store it more than once. |
No worries, it wasn't that much work!
Sounds good, I've made that change in a1ff94e.
That also sounds good. I believe I've covered all the places where The other update that I've made is to remove the one remaining TODO comment in 484c29b. I'm pretty sure that using spaces is consistent with the behaviour of Also, should I be opening a pull request from my branch now? I don't think there's any way for me to use this existing pull request, right? |
Yeah, I think that is true. You would need write permission on @iandwelker's clone, but it's much easier to create a new PR. That will give us a fresh empty state to discuss from. Thanks a lot to both @iandwelker for starting this and to you for finishing it!
Thanks, I'll love to have that as a PR. Please squash all changes into a single clean commit on top of the latest master. That way we can talk about the changes as a unit instead of looking at the evolution of the changes. |
Let's close this one now that we have #490. |
This should add basic support for specifying a tab width within the
Options
struct, as discussed in #419. As it turns out, I had to add atab_width
field within bothOptions
andWord
, though that was not what was planned.This PR should include all extra documentation needed to support this feature, and also includes a few more tests with the areas of this library that I understand well enough to add tests for (and passes all tests as well).
There are some functions in the library that I couldn't find a way to add support for tab widths within (specifically,
unfill
, since you have no way to pass in options, andrefill
, since you can't pass in the options to unfill so you can't be certain it was actually unfilled correctly).wrap
,wrap_columns
, andfill_inplace
all have tests, though, and should work just as expected.I am very open to criticism and suggestions for how differently this should be implemented if anyone feels that this is not the best way. Also please ask any questions you have, I would understand if my reasoning for parts of this is somewhat confusing.