Voting

: six minus three?
(Example: nine)

The Note You're Voting On

phpdoc at jeudi dot de
18 years ago
I\'d like to share some code to convert latin diacritics to their
traditional 7bit representation, like, for example,

- à,ç,é,î,... to a,c,e,i,...
- ß to ss
- ä,Ä,... to ae,Ae,...
- ë,... to e,...

(mb_convert \"7bit\" would simply delete any offending characters).

I might have missed on your country\'s typographic
conventions--correct me then.
<?php
/**
* @args string $text line of encoded text
* string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)
*
* @returns 7bit representation
*/
function to7bit($text,$from_enc) {
$text = mb_convert_encoding($text,\'HTML-ENTITIES\',$from_enc);
$text = preg_replace(
array(\'/ß/\',\'/&(..)lig;/\',
\'/&([aouAOU])uml;/\',\'/&(.)[^;]*;/\'),
array(\'ss\',\"$1\",\"$1\".\'e\',\"$1\"),
$text);
return $text;
}
?>

Enjoy :-)
Johannes

==
[EDIT BY danbrown AT php DOT net: Author provided the following update on 27-FEB-2012.]
==

An addendum to my "to7bit" function referenced below in the notes.
The function is supposed to solve the problem that some languages require a different 7bit rendering of special (umlauted) characters for sorting or other applications. For example, the German ß ligature is usually written "ss" in 7bit context. Dutch ÿ is typically rendered "ij" (not "y").

The original function works well with word (alphabet) character entities and I've seen it used in many places. But non-word entities cause funny results:
E.g., "©" is rendered as "c", "­" as "s" and "&rquo;" as "r".
The following version fixes this by converting non-alphanumeric characters (also chains thereof) to '_'.

<?php
/**
* @args string $text line of encoded text
* string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)
*
* @returns 7bit representation
*/
function to7bit($text,$from_enc) {
$text = preg_replace(/W+/,'_',$text);
$text = mb_convert_encoding($text,'HTML-ENTITIES',$from_enc);
$text = preg_replace(
array('/ß/','/&(..)lig;/',
'/&([aouAOU])uml;/','/ÿ/','/&(.)[^;]*;/'),
array('ss',"$1","$1".'e','ij',"$1"),
$text);
return $text;
}
?>

Enjoy again,
Johannes

<< Back to user notes page

To Top