-
Notifications
You must be signed in to change notification settings - Fork 462
Inconsistency with is_match and Python's search in Matching Specific Regex Patterns #1193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Good find! Even linking to the RegEx standard showing that it works using the documented reference 👍 |
Hope we can get this fixed soon 🔜🤞 |
Your regex is kinda messed up. Specifically, this part (which is repeated):
Python regexes don't support nested character classes unlike the
And indeed, using that with the use regex::Regex;
fn main() {
let pattern = r"(?:private|group)[_\[\w\d]*\]?_abc1d2345678ef90ab3c4567890defab[_\[\w\d]*\]?";
let compiled = Regex::new(pattern).unwrap();
let test_haystacks = vec![
"private_x9z45678abc12345d6e7890f123ghijk_abc1d2345678ef90ab3c4567890defab",
"private_x9z45678abc12345d6e7890f123ghijk_abc1d2345678ef90ab3c4567890defab___[[[aaa111]",
"private[_0f4f790_abc1d2345678ef90ab3c4567890defab",
];
for test_haystack in &test_haystacks {
match compiled.is_match(test_haystack) {
true => println!("PASS: {}", test_haystack),
false => eprintln!("FAIL: {}", test_haystack),
}
}
} (I also switched to using raw strings via |
This isn't a bug and there is no requirement that this crate matches Python's regex engine in all cases. There's also no regex standard at play here (governing either Python's or Rust's regex engine). |
Hi @BurntSushi 👋 I don't consider this issue invalid. I'm not in a position to change the un-compiled regular expressions as they are provided by end users, and if they're compilable, which they are, they are expected to be searchable. Do you have any particular guidance toward a solution for compatibility? |
I don't know what you mean by your assertion that they are "compatible." There is literally an unbounded number of ways in which Python regexes are different than Rust regexes. And this generally applies to all pairs of regex engines unless they very strictly follow a standard. (Of which, generally speaking, only two are prevalent: POSIX and ECMA. Neither Python's regex engine nor Rust's regex engine follow either one.)
I want to be clear here that this issue is definitively invalid within the scope of this project. That doesn't mean you don't have a problem. You might have a problem on your end where you have a pile of regexes that worked with one regex engine and need to use them, unchanged, with some other regex engine. But that isn't really a problem I can help with and is in general not a problem that can be easily solved for any two regex engines. (Unless your patterns happen to incidentally behave the same, or as I mentioned above, the regex engines strictly adhere to an existing standard.)
Well... of course not. Because I don't really know the structure of the problem you're trying to solve. All that's been presented to me here is a regex that works one way in Python and a seeming request to have it work the same way in Rust. But that will definitively not happen. As far as solving your problem in a different way, I don't know because I don't know what problem you're trying to solve. If, for example, these regexes are provided by end users and you've promised that the regex syntax is equivalent to whatever Python supports, then you need to use a regex engine with the goal of compatibility with Python's regex engine. (Of which, I believe only one exists. The |
What version of regex are you using?
I am using Rust 1.78.0 with the
regex
crate version 1.10.4.Describe the bug at a high level.
There is a discrepancy in regex pattern matching between Python's
re
module and Rust'sregex
crate. The same regex pattern, when tested in Python, matches all intended strings. However, in Rust, the pattern fails to match these strings.What are the steps to reproduce the behavior?
Here is a complete Rust program that reproduces the behavior:
What is the actual behavior?
The actual output of the Rust program indicates failures where the regex pattern does not match the test strings. For example:
What is the expected behavior?
I expect the Rust program's output to match the behavior observed in Python, where all provided test strings successfully match the regex pattern.
Additional Context
Below is the corresponding Python code that behaves as expected with the same regex pattern:
This can also be seen working as expected on https://regexr.com/
The text was updated successfully, but these errors were encountered: