Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The problem can be broken down to several steps:</p> <ol> <li>Populate a list with English words</li> <li>Split the sentence into terms delimited by white-space.</li> <li>Treat terms starting with '#' as hashtags</li> <li>For each hashtag, find words by longest match by checking if they exist in the list of words.</li> </ol> <p>Here is one solution using this approach:</p> <pre><code># Returns a list of common english terms (words) def initialize_words(): content = None with open('C:\wordlist.txt') as f: # A file containing common english words content = f.readlines() return [word.rstrip('\n') for word in content] def parse_sentence(sentence, wordlist): new_sentence = "" # output terms = sentence.split(' ') for term in terms: if term[0] == '#': # this is a hashtag, parse it new_sentence += parse_tag(term, wordlist) else: # Just append the word new_sentence += term new_sentence += " " return new_sentence def parse_tag(term, wordlist): words = [] # Remove hashtag, split by dash tags = term[1:].split('-') for tag in tags: word = find_word(tag, wordlist) while word != None and len(tag) &gt; 0: words.append(word) if len(tag) == len(word): # Special case for when eating rest of word break tag = tag[len(word):] word = find_word(tag, wordlist) return " ".join(words) def find_word(token, wordlist): i = len(token) + 1 while i &gt; 1: i -= 1 if token[:i] in wordlist: return token[:i] return None wordlist = initialize_words() sentence = "big #awesome-dayofmylife because #iamgreat" parse_sentence(sentence, wordlist) </code></pre> <p>It prints:</p> <pre><code>'big awe some day of my life because i am great ' </code></pre> <p>You will have to remove the trailing space, but that's easy. :)</p> <p>I got the wordlist from <a href="http://www-personal.umich.edu/~jlawler/wordlist" rel="nofollow noreferrer">http://www-personal.umich.edu/~jlawler/wordlist</a>.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload