Skip to content

Commit ec75f16

Browse files
committed
ext/pcre: update to PCRE2 v10.44
Previously: phpGH-13413. This version also contains a fix with `preg_match('\X')`, so that it can correctly detect grapheme clusters (PCRE2Project/pcre2#410). This is useful to correctly [polyfill the new `grapheme_str_split` function](https://php.watch/versions/8.4/grapheme_str_split#polyfill). Diff: pcre2lib [v10.43..v10.44](PCRE2Project/pcre2@pcre2-10.43...pcre2-10.44)
1 parent 51379d6 commit ec75f16

30 files changed

+4711
-2303
lines changed

UPGRADING

+4-2
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ PHP 8.4 UPGRADE NOTES
9090
of JIT startup initialization issues.
9191

9292
- PCRE:
93-
. The bundled pcre2lib has been updated to version 10.43.
93+
. The bundled pcre2lib has been updated to version 10.44.
9494
As a consequence, this means {,3} is now recognized as a quantifier instead
9595
of as text. Furthermore, the meaning of some character classes in UCP mode
9696
has changed. Consult https://github.com/PCRE2Project/pcre2/blob/master/NEWS
@@ -243,10 +243,12 @@ PHP 8.4 UPGRADE NOTES
243243
. Added support for the unix timestamp extension for zip archives.
244244

245245
- PCRE:
246-
. The bundled pcre2lib has been updated to version 10.43.
246+
. The bundled pcre2lib has been updated to version 10.44.
247247
As a consequence, LoongArch JIT support has been added, spaces
248248
are now allowed between braces in Perl-compatible items, and
249249
variable-length lookbehind assertions are now supported.
250+
. With pcre2lib version 10.44, the maximum length of named capture groups
251+
has changed from 32 to 128.
250252
. Added support for the "r" (PCRE2_EXTRA_CASELESS_RESTRICT) modifier, as well
251253
as the (?r) mode modifier. When enabled along with the case-insensitive
252254
modifier ("i"), the expression locks out mixing of ASCII and non-ASCII

ext/pcre/pcre2lib/pcre2_compile.c

+18-9
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
77
88
Written by Philip Hazel
99
Original API code Copyright (c) 1997-2012 University of Cambridge
10-
New API code Copyright (c) 2016-2023 University of Cambridge
10+
New API code Copyright (c) 2016-2024 University of Cambridge
1111
1212
-----------------------------------------------------------------------------
1313
Redistribution and use in source and binary forms, with or without
@@ -808,7 +808,8 @@ enum { ERR0 = COMPILE_ERROR_BASE,
808808
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
809809
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
810810
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
811-
ERR91, ERR92, ERR93, ERR94, ERR95, ERR96, ERR97, ERR98, ERR99, ERR100 };
811+
ERR91, ERR92, ERR93, ERR94, ERR95, ERR96, ERR97, ERR98, ERR99, ERR100,
812+
ERR101 };
812813

813814
/* This is a table of start-of-pattern options such as (*UTF) and settings such
814815
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
@@ -7549,7 +7550,8 @@ for (;; pptr++)
75497550
if (lengthptr != NULL)
75507551
{
75517552
PCRE2_SIZE delta;
7552-
if (PRIV(ckd_smul)(&delta, repeat_min - 1, length_prevgroup) ||
7553+
if (PRIV(ckd_smul)(&delta, repeat_min - 1,
7554+
(int)length_prevgroup) ||
75537555
OFLOW_MAX - *lengthptr < delta)
75547556
{
75557557
*errorcodeptr = ERR20;
@@ -7599,7 +7601,7 @@ for (;; pptr++)
75997601
{
76007602
PCRE2_SIZE delta;
76017603
if (PRIV(ckd_smul)(&delta, repeat_max,
7602-
length_prevgroup + 1 + 2 + 2*LINK_SIZE) ||
7604+
(int)length_prevgroup + 1 + 2 + 2*LINK_SIZE) ||
76037605
OFLOW_MAX + (2 + 2*LINK_SIZE) - *lengthptr < delta)
76047606
{
76057607
*errorcodeptr = ERR20;
@@ -9908,7 +9910,7 @@ do
99089910
*bptr |= branchlength; /* branchlength never more than 65535 */
99099911
bptr = *pptrptr;
99109912
}
9911-
while (*bptr == META_ALT);
9913+
while (META_CODE(*bptr) == META_ALT);
99129914

99139915
/* If any branch is of variable length, the whole lookbehind is of variable
99149916
length. If the maximum length of any branch exceeds the maximum for variable
@@ -10601,14 +10603,21 @@ if (length > MAX_PATTERN_SIZE)
1060110603
goto HAD_CB_ERROR;
1060210604
}
1060310605

10604-
/* Compute the size of, and then get and initialize, the data block for storing
10605-
the compiled pattern and names table. Integer overflow should no longer be
10606-
possible because nowadays we limit the maximum value of cb.names_found and
10607-
cb.name_entry_size. */
10606+
/* Compute the size of, then, if not too large, get and initialize the data
10607+
block for storing the compiled pattern and names table. Integer overflow should
10608+
no longer be possible because nowadays we limit the maximum value of
10609+
cb.names_found and cb.name_entry_size. */
1060810610

1060910611
re_blocksize = sizeof(pcre2_real_code) +
1061010612
CU2BYTES(length +
1061110613
(PCRE2_SIZE)cb.names_found * (PCRE2_SIZE)cb.name_entry_size);
10614+
10615+
if (re_blocksize > ccontext->max_pattern_compiled_length)
10616+
{
10617+
errorcode = ERR101;
10618+
goto HAD_CB_ERROR;
10619+
}
10620+
1061210621
re = (pcre2_real_code *)
1061310622
ccontext->memctl.malloc(re_blocksize, ccontext->memctl.memory_data);
1061410623
if (re == NULL)

ext/pcre/pcre2lib/pcre2_context.c

+9-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
77
88
Written by Philip Hazel
99
Original API code Copyright (c) 1997-2012 University of Cambridge
10-
New API code Copyright (c) 2016-2023 University of Cambridge
10+
New API code Copyright (c) 2016-2024 University of Cambridge
1111
1212
-----------------------------------------------------------------------------
1313
Redistribution and use in source and binary forms, with or without
@@ -136,6 +136,7 @@ const pcre2_compile_context PRIV(default_compile_context) = {
136136
NULL, /* Stack guard data */
137137
PRIV(default_tables), /* Character tables */
138138
PCRE2_UNSET, /* Max pattern length */
139+
PCRE2_UNSET, /* Max pattern compiled length */
139140
BSR_DEFAULT, /* Backslash R default */
140141
NEWLINE_DEFAULT, /* Newline convention */
141142
PARENS_NEST_LIMIT, /* As it says */
@@ -352,6 +353,13 @@ ccontext->max_pattern_length = length;
352353
return 0;
353354
}
354355

356+
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
357+
pcre2_set_max_pattern_compiled_length(pcre2_compile_context *ccontext, PCRE2_SIZE length)
358+
{
359+
ccontext->max_pattern_compiled_length = length;
360+
return 0;
361+
}
362+
355363
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
356364
pcre2_set_newline(pcre2_compile_context *ccontext, uint32_t newline)
357365
{

ext/pcre/pcre2lib/pcre2_error.c

+2-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
77
88
Written by Philip Hazel
99
Original API code Copyright (c) 1997-2012 University of Cambridge
10-
New API code Copyright (c) 2016-2023 University of Cambridge
10+
New API code Copyright (c) 2016-2024 University of Cambridge
1111
1212
-----------------------------------------------------------------------------
1313
Redistribution and use in source and binary forms, with or without
@@ -189,6 +189,7 @@ static const unsigned char compile_error_texts[] =
189189
"\\K is not allowed in lookarounds (but see PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK)\0"
190190
/* 100 */
191191
"branch too long in variable-length lookbehind assertion\0"
192+
"compiled pattern would be longer than the limit set by the application\0"
192193
;
193194

194195
/* Match-time and UTF error texts are in the same format. */

ext/pcre/pcre2lib/pcre2_extuni.c

+21-7
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
77
88
Written by Philip Hazel
99
Original API code Copyright (c) 1997-2012 University of Cambridge
10-
New API code Copyright (c) 2016-2021 University of Cambridge
10+
New API code Copyright (c) 2016-2024 University of Cambridge
1111
1212
-----------------------------------------------------------------------------
1313
Redistribution and use in source and binary forms, with or without
@@ -75,7 +75,11 @@ return NULL;
7575
* Match an extended grapheme sequence *
7676
*************************************************/
7777

78-
/*
78+
/* NOTE: The logic contained in this function is replicated in three special-
79+
purpose functions in the pcre2_jit_compile.c module. If the logic below is
80+
changed, they must be kept in step so that the interpreter and the JIT have the
81+
same behaviour.
82+
7983
Arguments:
8084
c the first character
8185
eptr pointer to next character
@@ -92,6 +96,7 @@ PCRE2_SPTR
9296
PRIV(extuni)(uint32_t c, PCRE2_SPTR eptr, PCRE2_SPTR start_subject,
9397
PCRE2_SPTR end_subject, BOOL utf, int *xcount)
9498
{
99+
BOOL was_ep_ZWJ = FALSE;
95100
int lgb = UCD_GRAPHBREAK(c);
96101

97102
while (eptr < end_subject)
@@ -102,6 +107,12 @@ while (eptr < end_subject)
102107
rgb = UCD_GRAPHBREAK(c);
103108
if ((PRIV(ucp_gbtable)[lgb] & (1u << rgb)) == 0) break;
104109

110+
/* ZWJ followed by Extended Pictographic is allowed only if the ZWJ was
111+
preceded by Extended Pictographic. */
112+
113+
if (lgb == ucp_gbZWJ && rgb == ucp_gbExtended_Pictographic && !was_ep_ZWJ)
114+
break;
115+
105116
/* Not breaking between Regional Indicators is allowed only if there
106117
are an even number of preceding RIs. */
107118

@@ -129,12 +140,15 @@ while (eptr < end_subject)
129140
if ((ricount & 1) != 0) break; /* Grapheme break required */
130141
}
131142

132-
/* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this
133-
allows any number of them before a following Extended_Pictographic. */
143+
/* Set a flag when ZWJ follows Extended Pictographic (with optional Extend in
144+
between; see next statement). */
145+
146+
was_ep_ZWJ = (lgb == ucp_gbExtended_Pictographic && rgb == ucp_gbZWJ);
147+
148+
/* If Extend follows Extended_Pictographic, do not update lgb; this allows
149+
any number of them before a following ZWJ. */
134150

135-
if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) ||
136-
lgb != ucp_gbExtended_Pictographic)
137-
lgb = rgb;
151+
if (rgb != ucp_gbExtend || lgb != ucp_gbExtended_Pictographic) lgb = rgb;
138152

139153
eptr += len;
140154
if (xcount != NULL) *xcount += 1;

ext/pcre/pcre2lib/pcre2_intmodedep.h

+2-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
77
88
Written by Philip Hazel
99
Original API code Copyright (c) 1997-2012 University of Cambridge
10-
New API code Copyright (c) 2016-2023 University of Cambridge
10+
New API code Copyright (c) 2016-2024 University of Cambridge
1111
1212
-----------------------------------------------------------------------------
1313
Redistribution and use in source and binary forms, with or without
@@ -568,6 +568,7 @@ typedef struct pcre2_real_compile_context {
568568
void *stack_guard_data;
569569
const uint8_t *tables;
570570
PCRE2_SIZE max_pattern_length;
571+
PCRE2_SIZE max_pattern_compiled_length;
571572
uint16_t bsr_convention;
572573
uint16_t newline_convention;
573574
uint32_t parens_nest_limit;

0 commit comments

Comments
 (0)