Skip to content

Add JIT support for PCRE2 #2791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 18, 2022

Conversation

wfjsw
Copy link
Contributor

@wfjsw wfjsw commented Aug 24, 2022

Added JIT to PCRE2.

I didn't thoroughly investigate the lifetime of the Regex class. The lifetime of the JIT result is attached to it.

Also it is worth noticing that PCRE2 is not following the pcre_malloc and pcre_free conventions, though I wonder what it will add if the allocation is scoped to requests.


m_pcje = pcre2_jit_compile(m_pc, PCRE2_JIT_COMPLETE);
m_pmc = pcre2_match_context_create(NULL);
m_pcjs = pcre2_jit_stack_create(32*1024, 512*1024, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my 2c question: why does it need the own stack handling?

I do not find the article now, where the author wrote unless there is a very compelling reason, you don't need to use pcre_jit_stack_create().

Few months ago I've started to play with PCRE2 JIT implementation - you can find it here. In performance, there isn't any difference between controlling own stack with huge memory like this, or without own stack.

I also checked Nginx, it also uses pcre2 JIT without own stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually a copy-pasta from https://www.pcre.org/current/doc/html/pcre2jit.html#SEC10
I don't know if there is any penalty doing so, and I'm fine removing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried without PCRE2 jit stack and the memory usage dropped by 50%. That's nice.

I do worry about situations like https://www.mail-archive.com/[email protected]/msg05763.html

It depends on the rules, though.

@martinhsv
Copy link
Contributor

The main thing I would be concerned about here is the jit stack limit issue.

I don't have an opinion on whether the default of 32KB on the machine stack is large enough for normal (non-attack) use cases. But I can almost guarantee that either an attacker or a researcher will be able to breach that limit without using particularly excessive inputs (I'm thinking of some of the more complex CRS rules ).

The problem is that we are not handling this error scenario. PCRE2_ERROR_JIT_STACKLIMIT can effectively be treated the same as a simple 'no match' result -- which makes for a bypass opportunity.

(This new need overlaps with proposed per-match PCRE limits in #2736 , which I'll have to review soon.)

@martinhsv
Copy link
Contributor

The other possible alternative if PCRE2_ERROR_JIT_STACKLIMIT in encountered, is to retry the match using regular (non-JIT) matching.

@FireBurn
Copy link

FireBurn commented Dec 7, 2022

I've just tried this PR locally and I get the following error:

utils/regex.cc: In member function 'int modsecurity::Utils::Regex::search(const string&, modsecurity::Utils::SMatch*) const':
utils/regex.cc:294:24: error: 'rc' was not declared in this scope
     if (m_pcje != 0 || rc == PCRE2_ERROR_JIT_STACKLIMIT) {
                        ^~
utils/regex.cc:294:24: note: suggested alternative: 'rcmd'
     if (m_pcje != 0 || rc == PCRE2_ERROR_JIT_STACKLIMIT) {
                        ^~
                        rcmd

Adding an "int rc;" fixes things

@wfjsw
Copy link
Contributor Author

wfjsw commented Dec 8, 2022

Ah my fault

@martinhsv martinhsv changed the title feat: PCRE2 JIT Add JIT support for PCRE2 Dec 9, 2022
@martinhsv
Copy link
Contributor

The current version looks ok. Thanks all for the various contributions. I'll plan to merge this.

@martinhsv
Copy link
Contributor

Hi @wfjsw ,

cppcheck is complaining that:

warning: src/operators/verify_cc.h,39,warning,uninitMemberVar,Member variable 'VerifyCC::m_pcje' is not initialized in the constructor.

The tool is correct. Unlike with the Regex class, the normal assignment to m_pcje occurs in the init() function rather than in the constructor.

Do you want to make that amendment to the VerifyCC class? (An initial value of PCRE2_ERROR_JIT_BADOPTION might be appropriate.)

@wfjsw wfjsw force-pushed the feature/pcre2-jit branch from 5652abd to 4d79f39 Compare December 10, 2022 01:28
@wfjsw wfjsw force-pushed the feature/pcre2-jit branch from 4d79f39 to 54ff1ea Compare December 10, 2022 03:43
@martinhsv
Copy link
Contributor

Unfortunately, it's still failing cppcheck ...

2022-12-11T21:07:44.1294490Z warning: src/utils/regex.cc,127,warning,uninitvar,Uninitialized variable: rc
2022-12-11T21:07:44.1295630Z warning: src/utils/regex.cc,175,warning,uninitvar,Uninitialized variable: rc
2022-12-11T21:07:44.1296350Z warning: src/utils/regex.cc,294,warning,uninitvar,Uninitialized variable: ret
2022-12-11T21:07:44.1296950Z warning: src/utils/regex.cc,327,warning,uninitvar,Uninitialized variable: rc

Don't worry about it. I can commit as-is and then fix it up.

@martinhsv martinhsv merged commit f037bd2 into owasp-modsecurity:v3/master Dec 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants