-
Notifications
You must be signed in to change notification settings - Fork 22
add support to ignore ansi sequences when formatting usage display. Fixes #879 #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Sorry, but I definitely don't have time right now to maintain args. :( |
Understandable - I'm excited for whatever you are working on 😄 |
int get lengthWithoutAnsi { | ||
if (!_AnsiUtils.hasAnsi(this)) return length; | ||
return _AnsiUtils.stripAnsi(this).length; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more expensive than it needs to be.
You know the length of the total string, and you have the matches for every Ansi escape sequence, so you could just subtract the lengths of that, without needing to actually create a new string.
int get lengthWithoutAnsi {
var length = this.length;
for (var escape in _AnsiUtils.ansiRE.allMatches(this)) {
length -= escape.end - escape.start;
}
return length;
}
(If we are a little lucky, the RegExp match won't eagerly allocate a match string if we don't ask for it.)
static final String ansiCodePattern = [ | ||
'[\\u001B\\u009B][[\\]()#;?]*(?:(?:(?:[a-zA-Z\\d]*(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]*)*)?\\u0007)', | ||
'(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PR-TZcf-ntqry=><~]))' | ||
].join('|'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use raw strings for RegExp patterns. No need to use lists and join
, just have adjacent strings:
static const String ansiCodePattern =
r'[\x1b\x9b][[\]()#;?]*'
r'(?:' //
r'(?:' // -
r'(?:[a-zA-Z\d]*' //--
r'(?:;[\-a-zA-Z\d/#&.:=?%@~_]*)*'
r')?\x07' // -
r'|'
r'(?:' // --
r'(?:\d{1,4}(?:;\d{0,4})*[\dA-PR-TZcf-ntqry=><~]'
r')' //-
r')'; //
|
||
/// A utility class for finding and stripping ANSI codes from strings. | ||
class _AnsiUtils { | ||
static final String ansiCodePattern = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be very nice to have a textual description of what the pattern is supposed to match.
It's hard enough to debug RegExps even if you have a specification of what they're intended to do.
Without that, it's impossible to tell whether it's right or wrong.
Is this a correct description:
- One of U+001b or U+009b
- Followed by any number of the characters:
[
,]
,(
,)
,#
,;
,?
- Followed by either:
- zero or more letters or digits
- followed by any number of repeats of:
;
- followed by any number of digits, letters, or characters from
-/#^.:=?%@~_
- or:
- 1-4 digits
- followed by any number of repeats of:
;
- followed by 0-4 digits
- followed by on digit, letter other than QUVWXY or abdeopsuvwxz, or one of
=><~
.
I don't know what the official specification of this is.
I'd be tempted to just go with what Wikpedia says is a simple control sequence introduction:
r'\x1b[0-?]*[ -/]*[@-~]'
Using U+009B as well is dangerous. It means something else in both UTF-8 and Windows-1252 code pages. (But I guess we're working on Dart strings, so we're past encoding issues.)
/// Pads [source] to [length] by adding spaces at the end. | ||
String padRight(String source, int length) => | ||
source + ' ' * (length - source.length); | ||
source + ' ' * (length - source.lengthWithoutAnsi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider adding .ansiLength
which only accumulates the length of the Ansi sequences,
so that it's just int get lengthWithoutAnsi => length - ansiLength;
.
Then this can be source.padRight(length + source.ansiLength)
. (Using padRight
is more efficient because it doesn't have to create the intermediate ' ' * ...
string. I guess a good optimizing compiler can avoid that too, but it's better to be safe.)
const _ansiMixedStyles = 'Normal, \x1B[31mRed\x1B[0m, \x1B[1mBold\x1B[0m, \x1B[4mUnderline\x1B[0m, \x1B[1;34mBold Blue\x1B[0m, Normal again.'; | ||
const _ansiLongSequence = 'Start \x1B[1;3;4;5;7;9;31;42;38;5;196;48;5;226m Beaucoup formatting! \x1B[0m End'; | ||
const _ansiCombined256 = '\x1B[1;38;5;27;48;5;220mBold Bright Blue FG (27) on Gold BG (220)\x1B[0m'; | ||
const _ansiCombinedTrueColor = '\x1B[4;48;2;50;50;50;38;2;150;250;150mUnderlined Light Green FG on Dark Grey BG\x1B[0m'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all of the form \x1b\[[\d;]*m
. That doesn't cover most of the complexity of the RegExp.
If this is all the functionality needs to do, the RegExp shouldn't be as complicated. (It probably shouldn't, even if recognizing more escapes than just the color ones.)
This PR adds logic to exclude hidden ANSI escape sequences when formatting the Usage display.
The ANSI escape sequences are excluded when calculating the lengths of help strings used in the Usage display by using a string extension
lengthWithoutAnsi
which removes any hidden ANSI escape sequences present before calculating the string length.It includes tests of the
lengthWithoutAnsi
getter to ensure that it returns correct length values when no ANSI escape sequences are present as well as when a variety of ANSI escape sequences are present.(The regex that i am using here is well exercised, as it is the same reg ex that I use within VS Code when parsing and formatting ANSI sequences within the debug console)