ge********@gmail.com wrote:
Put simply, I have a text box, and people commonly cut + paste
information into this text box from Microsoft word. The problem is that
word has all types of funky characters (smart quotes, em-dashes), that
the system (php-based) doesn't understand. Does anyone know of a way to
filter out these Microsoft-specific characters? Does PHP have a special
function for this? Thanks a lot!
Hooray I can actually be of use to this group for once. Yes, if you look
in the user notes on php.net for the htmlentities function you will see
an entry from mail at britlinks dot com (19-May-2004 05:27). I've listed
it below for reference. Mind you I'm sure the hardcore programmers on
this group will be able to formulate a one-line regexp for this and we
look forward to seeing it.
In the meantime, I hope this helps.
<?php
// strips slashes, and converts special characters to HTML equivalents
for string defined in $var
function htmlfriendly($var,$nl2br = false){
$chars = array(
128 => '€',
130 => '‚',
131 => 'ƒ',
132 => '„',
133 => '…',
134 => '†',
135 => '‡',
136 => 'ˆ',
137 => '‰',
138 => 'Š',
139 => '‹',
140 => 'Œ',
142 => 'Ž',
145 => '‘',
146 => '’',
147 => '“',
148 => '”',
149 => '•',
150 => '–',
151 => '—',
152 => '˜',
153 => '™',
154 => 'š',
155 => '›',
156 => 'œ',
158 => 'ž',
159 => 'Ÿ');
$var = str_replace(array_map('chr', array_keys($chars)), $chars,
htmlentities(stripslashes($var)));
if($nl2br){
return nl2br($var);
} else {
return $var;
}
}
?>