Skip to content

add support to ignore ansi sequences when formatting usage display. Fixes #879 #880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

timmaffett
Copy link

This PR adds logic to exclude hidden ANSI escape sequences when formatting the Usage display.
The ANSI escape sequences are excluded when calculating the lengths of help strings used in the Usage display by using a string extension lengthWithoutAnsi which removes any hidden ANSI escape sequences present before calculating the string length.

It includes tests of the lengthWithoutAnsi getter to ensure that it returns correct length values when no ANSI escape sequences are present as well as when a variety of ANSI escape sequences are present.

(The regex that i am using here is well exercised, as it is the same reg ex that I use within VS Code when parsing and formatting ANSI sequences within the debug console)


  • [X ] I’ve reviewed the contributor guide and applied the relevant portions to this PR.

@timmaffett timmaffett requested a review from munificent as a code owner April 16, 2025 21:38
@munificent munificent removed their request for review April 24, 2025 17:20
@munificent
Copy link
Member

Sorry, but I definitely don't have time right now to maintain args. :(

@timmaffett
Copy link
Author

Understandable - I'm excited for whatever you are working on 😄

int get lengthWithoutAnsi {
if (!_AnsiUtils.hasAnsi(this)) return length;
return _AnsiUtils.stripAnsi(this).length;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more expensive than it needs to be.
You know the length of the total string, and you have the matches for every Ansi escape sequence, so you could just subtract the lengths of that, without needing to actually create a new string.

  int get lengthWithoutAnsi {
    var length = this.length;
    for (var escape in _AnsiUtils.ansiRE.allMatches(this)) {
      length -= escape.end - escape.start;
    }
    return length;
  }

(If we are a little lucky, the RegExp match won't eagerly allocate a match string if we don't ask for it.)

static final String ansiCodePattern = [
'[\\u001B\\u009B][[\\]()#;?]*(?:(?:(?:[a-zA-Z\\d]*(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]*)*)?\\u0007)',
'(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PR-TZcf-ntqry=><~]))'
].join('|');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use raw strings for RegExp patterns. No need to use lists and join, just have adjacent strings:

  static const String ansiCodePattern = 
     r'[\x1b\x9b][[\]()#;?]*'
     r'(?:'  //
     r'(?:'  // -
     r'(?:[a-zA-Z\d]*' //--
     r'(?:;[\-a-zA-Z\d/#&.:=?%@~_]*)*'
     r')?\x07' // -
     r'|'
     r'(?:' // --
     r'(?:\d{1,4}(?:;\d{0,4})*[\dA-PR-TZcf-ntqry=><~]'
     r')' //-
     r')'; //


/// A utility class for finding and stripping ANSI codes from strings.
class _AnsiUtils {
static final String ansiCodePattern = [
Copy link
Member

@lrhn lrhn Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very nice to have a textual description of what the pattern is supposed to match.
It's hard enough to debug RegExps even if you have a specification of what they're intended to do.
Without that, it's impossible to tell whether it's right or wrong.

Is this a correct description:

  • One of U+001b or U+009b
  • Followed by any number of the characters: [, ], (, ), #, ;, ?
  • Followed by either:
    • zero or more letters or digits
    • followed by any number of repeats of:
      • ;
      • followed by any number of digits, letters, or characters from -/#^.:=?%@~_
  • or:
    • 1-4 digits
    • followed by any number of repeats of:
      • ;
      • followed by 0-4 digits
    • followed by on digit, letter other than QUVWXY or abdeopsuvwxz, or one of =><~.

I don't know what the official specification of this is.
I'd be tempted to just go with what Wikpedia says is a simple control sequence introduction:

r'\x1b[0-?]*[ -/]*[@-~]'

Using U+009B as well is dangerous. It means something else in both UTF-8 and Windows-1252 code pages. (But I guess we're working on Dart strings, so we're past encoding issues.)

/// Pads [source] to [length] by adding spaces at the end.
String padRight(String source, int length) =>
source + ' ' * (length - source.length);
source + ' ' * (length - source.lengthWithoutAnsi);
Copy link
Member

@lrhn lrhn Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider adding .ansiLength which only accumulates the length of the Ansi sequences,
so that it's just int get lengthWithoutAnsi => length - ansiLength;.
Then this can be source.padRight(length + source.ansiLength). (Using padRight is more efficient because it doesn't have to create the intermediate ' ' * ... string. I guess a good optimizing compiler can avoid that too, but it's better to be safe.)

const _ansiMixedStyles = 'Normal, \x1B[31mRed\x1B[0m, \x1B[1mBold\x1B[0m, \x1B[4mUnderline\x1B[0m, \x1B[1;34mBold Blue\x1B[0m, Normal again.';
const _ansiLongSequence = 'Start \x1B[1;3;4;5;7;9;31;42;38;5;196;48;5;226m Beaucoup formatting! \x1B[0m End';
const _ansiCombined256 = '\x1B[1;38;5;27;48;5;220mBold Bright Blue FG (27) on Gold BG (220)\x1B[0m';
const _ansiCombinedTrueColor = '\x1B[4;48;2;50;50;50;38;2;150;250;150mUnderlined Light Green FG on Dark Grey BG\x1B[0m';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all of the form \x1b\[[\d;]*m. That doesn't cover most of the complexity of the RegExp.

If this is all the functionality needs to do, the RegExp shouldn't be as complicated. (It probably shouldn't, even if recognizing more escapes than just the color ones.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants