Skip to content

Expose whether a regex_automata error was a size overflow or another error #1236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
konstin opened this issue Nov 11, 2024 · 2 comments · Fixed by #1237
Closed

Expose whether a regex_automata error was a size overflow or another error #1236

konstin opened this issue Nov 11, 2024 · 2 comments · Fixed by #1237

Comments

@konstin
Copy link

konstin commented Nov 11, 2024

I'm building an DFA from user provided expressions for a fast-path optimization, which I can skip when the DFA would be too large. Currently, there is no way to tell whether building the DFA failed because there was a syntax error (which I want to raise to the user), or because there was a size overflow (which is non-fatal). It would be great if regex_automata::dfa::dense::BuildError would allow inspecting whether it's a size error.

Motivating example:

let dfa_builder = dfa::dense::Builder::new()
    .configure(
        dfa::dense::Config::new()
            // DFA can grow exponentially, in which case we bail out
            .dfa_size_limit(Some(DFA_SIZE_LIMIT))
            .determinize_size_limit(Some(DFA_SIZE_LIMIT)),
    )
    .build_many(&regexes);
let dfa = match dfa_builder {
    Ok(dfa) => Some(dfa),
    Err(_) => {
        // TODO(konsti): `regex_automata::dfa::dense::BuildError` should allow asking whether
        // is a size error
        warn!(
            "Glob expressions regex is larger than {DFA_SIZE_LIMIT} bytes, \
            falling back to full directory traversal!"
        );
        None
    }
};
@BurntSushi
Copy link
Member

Yeah, I think adding a simple predicate like, is_exceeded_size_limit or something like that would be appropriate here. There are multiple different ways to blow the size limit. There are the configured size limits of course, but there are also built-in size limits due to states and patterns using u32 as their identifier type. (i.e., If you try to build a DFA with more than 2^32 - 1 states.) So even if all configured size limits are disabled, you can still get a size limit error.

The other two classes of errors are "NFA failed to build" and "regex feature unsupported." The latter, I believe, can never happen if Unicode mode is disabled. The former is only relevant if you're using the convenience APIs that build a DFA from a pattern string (which you are here). But even that can be avoided by using Builder::build_from_nfa.

So if you use Builder::build_from_nfa and disable Unicode mode, then the only possible error remaining in BuildError is a size limit related error. This means you can work-around this today, but I agree that adding a predicate here would make use cases like yours a little smoother.

@konstin Do you have ideas for what the predicate should be named?

@konstin
Copy link
Author

konstin commented Nov 11, 2024

is_size_limit_exceeded sounds good

BurntSushi added a commit that referenced this issue Nov 11, 2024
This adds a new predicate that supports very minimal introspection
ability into why DFA construction failed.

Closes #1236
BurntSushi added a commit that referenced this issue Nov 11, 2024
This adds a new predicate that supports very minimal introspection
ability into why DFA construction failed.

Closes #1236
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants