-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add JIT support for PCRE2 #2791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JIT support for PCRE2 #2791
Conversation
src/utils/regex.cc
Outdated
|
||
m_pcje = pcre2_jit_compile(m_pc, PCRE2_JIT_COMPLETE); | ||
m_pmc = pcre2_match_context_create(NULL); | ||
m_pcjs = pcre2_jit_stack_create(32*1024, 512*1024, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my 2c question: why does it need the own stack handling?
I do not find the article now, where the author wrote unless there is a very compelling reason, you don't need to use pcre_jit_stack_create()
.
Few months ago I've started to play with PCRE2 JIT implementation - you can find it here. In performance, there isn't any difference between controlling own stack with huge memory like this, or without own stack.
I also checked Nginx, it also uses pcre2 JIT without own stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually a copy-pasta from https://www.pcre.org/current/doc/html/pcre2jit.html#SEC10
I don't know if there is any penalty doing so, and I'm fine removing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried without PCRE2 jit stack and the memory usage dropped by 50%. That's nice.
I do worry about situations like https://www.mail-archive.com/[email protected]/msg05763.html
It depends on the rules, though.
The main thing I would be concerned about here is the jit stack limit issue. I don't have an opinion on whether the default of 32KB on the machine stack is large enough for normal (non-attack) use cases. But I can almost guarantee that either an attacker or a researcher will be able to breach that limit without using particularly excessive inputs (I'm thinking of some of the more complex CRS rules ). The problem is that we are not handling this error scenario. PCRE2_ERROR_JIT_STACKLIMIT can effectively be treated the same as a simple 'no match' result -- which makes for a bypass opportunity. (This new need overlaps with proposed per-match PCRE limits in #2736 , which I'll have to review soon.) |
The other possible alternative if PCRE2_ERROR_JIT_STACKLIMIT in encountered, is to retry the match using regular (non-JIT) matching. |
I've just tried this PR locally and I get the following error:
Adding an "int rc;" fixes things |
Ah my fault |
The current version looks ok. Thanks all for the various contributions. I'll plan to merge this. |
Hi @wfjsw , cppcheck is complaining that:
The tool is correct. Unlike with the Regex class, the normal assignment to m_pcje occurs in the init() function rather than in the constructor. Do you want to make that amendment to the VerifyCC class? (An initial value of PCRE2_ERROR_JIT_BADOPTION might be appropriate.) |
5652abd
to
4d79f39
Compare
4d79f39
to
54ff1ea
Compare
Unfortunately, it's still failing cppcheck ...
Don't worry about it. I can commit as-is and then fix it up. |
Added JIT to PCRE2.
I didn't thoroughly investigate the lifetime of the
Regex
class. The lifetime of the JIT result is attached to it.Also it is worth noticing that PCRE2 is not following the
pcre_malloc
andpcre_free
conventions, though I wonder what it will add if the allocation is scoped to requests.