Proposal for adding a RegExp.escape
method to the ECMAScript standard http://benjamingr.github.io/RexExp.escape/.
This proposal is a stage 0 (strawman) proposal and is awaiting specification, implementation and input.
See this issue. It is often the case when we want to build a regular expression out of a string without treating special characters from the string as special regular expression tokens. For example if we want to replace all occurrences of the the string Hello.
which we got from the user we might be tempted to do ourLongText.replace(new RegExp(text, "g"))
but this would match .
against any character rather than a dot.
This is a fairly common use in regular expressions and standardizing it would be useful.
In other languages:
- Perl: quotemeta(str) - see the docs
- PHP: preg_quote(str) - see the docs
- Python: re.escape(str) - see the docs
- Ruby: Regexp.escape(str) - see the docs
- Java: Pattern.quote(str) - see the docs
- C#, VB.NET: Regex.Escape(str) - see the docs
Note that the languages differ in what they do - (perl does something different from C#) but they all have the same goal.
We propose the addition of an RegExp.escape
function, such that strings can be escaped in order to be used inside regular expressions:
var str = prompt("Please enter a string");
str = RegExp.escape(str);
alert(ourLongText.replace(new RegExp(str, "g")); // handles reg exp special tokens with the replacement.
There is initial previous work here: https://gist.github.com/kangax/9698100 which includes valuable work we've used. Unlike that proposal this one uses the spec's SyntaxCharacter
list of characters so updates are in sync with the specificaiton instead of specifying the characters escaped manually.
##Cross-Cutting Concerns
The list of escaped identifiers should be kept in sync with what the regular expressions grammar considers to be syntax characters that need escaping - for this reason instead of hard-coding the list of escaped characters we escape characters that are recognized as a SyntaxCharacter
s by the engine. For example, if regex comments are ever added to the specification (presumably under a flag) - this ensures they are properly escaped.
##FAQ
##Semantics
When the escape function is called with an argument S the following steps are taken:
- Let str be ToString(S).
- ReturnIfAbrupt(str).
- Let cpList be a List containing in order the code points as defined in 6.1.4 of str, starting at the first element of str.
- Let cuList be a new List.
- For each code point c in cpList in List order, do:
- If c is matched by SyntaxCharacter then do:
- Append code unit 0x002F (SOLIDUS) to cuList.
- Append the elements of the UTF16Encoding (10.1.1) of c to cuList.
- Let L be a String whose elements are, in order, the elements of cuList.
- Return L.
##Usage Examples
RegExp.escape("The Quick Brown Fox"); // "The Quick Brown Fox"
RegExp.escape("Buy it. use it. break it. fix it.") // "Buy it\. use it\. break it\. fix it\."
RegExp.escape("(*.*)"); // "\(\*\.\*\)"
RegExp.escape("。^・ェ・^。") // "。\^・ェ・\^。"
RegExp.escape("😊 *_* +_+ ... 👍"); // "😊 \*_\* \+_\+ \.\.\. 👍"