Note that there are some explanatory texts on larger screens.

plurals
  1. POProblem trying to extract words from string in PHP
    primarykey
    data
    text
    <p>I'm trying to extract all words from a string into an array, but i am having some problems with spaces (<code>&amp;nbsp;</code>).</p> <p>This is what I do:</p> <pre><code>//Clean data to text only $data = strip_tags($data); $data = htmlentities($data, ENT_QUOTES, 'UTF-8'); $data = html_entity_decode($data, ENT_QUOTES, 'UTF-8'); $data = htmlspecialchars_decode($data); $data = mb_strtolower($data, 'UTF-8'); //Clean up text from special chrs I don't want as words $data = str_replace(',', '', $data); $data = str_replace('.', '', $data); $data = str_replace(':', '', $data); $data = str_replace(';', '', $data); $data = str_replace('*', '', $data); $data = str_replace('?', '', $data); $data = str_replace('!', '', $data); $data = str_replace('-', ' ', $data); $data = str_replace("\n", ' ', $data); $data = str_replace("\r", ' ', $data); $data = str_replace("\t", ' ', $data); $data = str_replace("\0", ' ', $data); $data = str_replace("\x0B", ' ', $data); $data = str_replace("&amp;nbsp;", ' ', $data); //Clean up duplicated spaces do { $data = str_replace(' ', ' ', $data); } while(strpos($data, ' ') !== false); //Make array $clean_data = explode(' ', $data); echo "&lt;pre&gt;"; var_dump($clean_data); echo "&lt;/pre&gt;"; </code></pre> <p><strong>This outputs:</strong></p> <pre><code>array(58) { [0]=&gt; string(5) " " [1]=&gt; string(5) " " [2]=&gt; string(11) "anläggning" [3]=&gt; string(3) "med" [4]=&gt; string(3) "den" [5]=&gt; string(10) "erfarenhet" [6]=&gt; string(3) "som" } </code></pre> <p>If i check source for output i see that the first 2 array values is <code>&amp;nbsp;</code>.<br> No matter how I try, I can't remove this from the string. Any ideas?</p> <p><strong>UPDATE:</strong><br> After some tweaking with code i manage to get following output:</p> <pre><code>array(56) { [0]=&gt; string(1) "�" //Notice change. Instead of string length 5 it now says 1. But still its garbage. [1]=&gt; string(1) "�" [2]=&gt; string(11) "anläggning" [3]=&gt; string(3) "med" [4]=&gt; string(3) "den" [5]=&gt; string(10) "erfarenhet" [6]=&gt; string(3) "som" [7]=&gt; string(5) "finns" [8]=&gt; string(4) "inom" </code></pre> <p>Thanks!</p> <p><strong>ANSWER (for lazy people):</strong> </p> <p>Even thou this is a slightly different approach to the problem, and it never really answers why I had the problems I had above (like leftover <code>&amp;nbsp;</code> and other extra weird spaces), I like it and it is a lot better than my original code.</p> <p>Thanks to all who contributed to this! </p> <pre><code>//Clean data to text only $data = strip_tags($data); $data = html_entity_decode($data, ENT_QUOTES, 'UTF-8'); $data = htmlspecialchars_decode($data); $data = mb_strtolower($data, 'UTF-8'); //Clean up text from special chrs $data = str_replace(array("-"), ' ', $data); $clean_data = str_word_count($data, 1, 'äöå'); echo "&lt;pre&gt;"; var_dump($clean_data); echo "&lt;/pre&gt;"; </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload