[doc-en] master: add details about php8.1 mb-detect-encoding unordered encodings (#2426)

From: Date: Wed, 15 Oct 2025 14:03:27 +0000
Subject: [doc-en] master: add details about php8.1 mb-detect-encoding unordered encodings (#2426)
Groups: php.doc.cvs 
Request: Send a blank email to [email protected] to get a copy of this message
Author: John Spaetzel (jspaetzel)
Committer: GitHub (web-flow)
Pusher: afilina
Date: 2025-10-15T10:03:25-04:00

Commit: https://github.com/php/doc-en/commit/7c4b5fb40ac3149a5b931f1e31b1050ab5eaab7e
Raw diff: https://github.com/php/doc-en/commit/7c4b5fb40ac3149a5b931f1e31b1050ab5eaab7e.diff

add details about php8.1 mb-detect-encoding unordered encodings (#2426)

* add details about php8.1 being unordered

* simplify message about exclusions

* add comment to example

Changed paths:
  M  reference/mbstring/functions/mb-detect-encoding.xml


Diff:

diff --git a/reference/mbstring/functions/mb-detect-encoding.xml
b/reference/mbstring/functions/mb-detect-encoding.xml
index c6b1a1fb43d6..692ff51ef435 100644
--- a/reference/mbstring/functions/mb-detect-encoding.xml
+++ b/reference/mbstring/functions/mb-detect-encoding.xml
@@ -16,7 +16,11 @@
   </methodsynopsis>
   <para>
    Detects the most likely character encoding for <type>string</type>
<parameter>string</parameter>
-   from an ordered list of candidates.
+   from a list of candidates.
+  </para>
+  <para>
+   As of PHP 8.1 this function uses heuristics to detect which of the valid text encodings in the
specified
+   list is most likely to be correct and may not be in order of
<parameter>encodings</parameter> provided.
   </para>
   <para>
    Automatic detection of the intended character encoding can never be entirely reliable;
@@ -27,7 +31,7 @@
   <para>
    This function is most useful with multibyte encodings, where not all sequences of
    bytes form a valid string. If the input string contains such a sequence, that
-   encoding will be rejected, and the next encoding checked.
+   encoding will be rejected.
   </para>
 
   <warning>
@@ -58,7 +62,7 @@
      <term><parameter>encodings</parameter></term>
      <listitem>
       <para>
-       A list of character encodings to try, in order. The list may be specified as
+       A list of character encodings to try. The list may be specified as
        an array of strings, or a single string separated by commas.
       </para>
       <para>
@@ -223,8 +227,9 @@ string(10) "ISO-8859-1"
 <?php
 $str = "\xC4\xA2";
 
-// The string is valid in all three encodings, so the first one listed will be returned
-var_dump(mb_detect_encoding($str, ['UTF-8', 'ISO-8859-1',
'ISO-8859-5']));
+// The string is valid in all three encodings, but the first one listed may not always be the one
returned
+var_dump(mb_detect_encoding($str, ['UTF-8']));
+var_dump(mb_detect_encoding($str, ['UTF-8', 'ISO-8859-1',
'ISO-8859-5'])); // as of php8.1 this returns ISO-8859-1 instead of UTF-8
 var_dump(mb_detect_encoding($str, ['ISO-8859-1', 'ISO-8859-5',
'UTF-8']));
 var_dump(mb_detect_encoding($str, ['ISO-8859-5', 'UTF-8',
'ISO-8859-1']));
 ?>
@@ -235,6 +240,7 @@ var_dump(mb_detect_encoding($str, ['ISO-8859-5', 'UTF-8',
'ISO-8859-1']));
 <![CDATA[
 string(5) "UTF-8"
 string(10) "ISO-8859-1"
+string(10) "ISO-8859-1"
 string(10) "ISO-8859-5"
 ]]>
     </screen>


Thread (1 message)

  • John Spaetzel via GitHub
« previous php.doc.cvs (#22808) next »