Send a blank email to [email protected] to get a copy of this message
Author: John Spaetzel (jspaetzel)
Committer: GitHub (web-flow)
Pusher: afilina
Date: 2025-10-15T10:03:25-04:00
Commit: https://github.com/php/doc-en/commit/7c4b5fb40ac3149a5b931f1e31b1050ab5eaab7e
Raw diff: https://github.com/php/doc-en/commit/7c4b5fb40ac3149a5b931f1e31b1050ab5eaab7e.diff
add details about php8.1 mb-detect-encoding unordered encodings (#2426)
* add details about php8.1 being unordered
* simplify message about exclusions
* add comment to example
Changed paths:
M reference/mbstring/functions/mb-detect-encoding.xml
Diff:
diff --git a/reference/mbstring/functions/mb-detect-encoding.xml
b/reference/mbstring/functions/mb-detect-encoding.xml
index c6b1a1fb43d6..692ff51ef435 100644
--- a/reference/mbstring/functions/mb-detect-encoding.xml
+++ b/reference/mbstring/functions/mb-detect-encoding.xml
@@ -16,7 +16,11 @@
</methodsynopsis>
<para>
Detects the most likely character encoding for <type>string</type>
<parameter>string</parameter>
- from an ordered list of candidates.
+ from a list of candidates.
+ </para>
+ <para>
+ As of PHP 8.1 this function uses heuristics to detect which of the valid text encodings in the
specified
+ list is most likely to be correct and may not be in order of
<parameter>encodings</parameter> provided.
</para>
<para>
Automatic detection of the intended character encoding can never be entirely reliable;
@@ -27,7 +31,7 @@
<para>
This function is most useful with multibyte encodings, where not all sequences of
bytes form a valid string. If the input string contains such a sequence, that
- encoding will be rejected, and the next encoding checked.
+ encoding will be rejected.
</para>
<warning>
@@ -58,7 +62,7 @@
<term><parameter>encodings</parameter></term>
<listitem>
<para>
- A list of character encodings to try, in order. The list may be specified as
+ A list of character encodings to try. The list may be specified as
an array of strings, or a single string separated by commas.
</para>
<para>
@@ -223,8 +227,9 @@ string(10) "ISO-8859-1"
<?php
$str = "\xC4\xA2";
-// The string is valid in all three encodings, so the first one listed will be returned
-var_dump(mb_detect_encoding($str, ['UTF-8', 'ISO-8859-1',
'ISO-8859-5']));
+// The string is valid in all three encodings, but the first one listed may not always be the one
returned
+var_dump(mb_detect_encoding($str, ['UTF-8']));
+var_dump(mb_detect_encoding($str, ['UTF-8', 'ISO-8859-1',
'ISO-8859-5'])); // as of php8.1 this returns ISO-8859-1 instead of UTF-8
var_dump(mb_detect_encoding($str, ['ISO-8859-1', 'ISO-8859-5',
'UTF-8']));
var_dump(mb_detect_encoding($str, ['ISO-8859-5', 'UTF-8',
'ISO-8859-1']));
?>
@@ -235,6 +240,7 @@ var_dump(mb_detect_encoding($str, ['ISO-8859-5', 'UTF-8',
'ISO-8859-1']));
<![CDATA[
string(5) "UTF-8"
string(10) "ISO-8859-1"
+string(10) "ISO-8859-1"
string(10) "ISO-8859-5"
]]>
</screen>