StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>First of all - try to drop regular expressions, they are heavy. My original advice was crappy - it would not have worked. Maybe, this will be more efficient</p> <pre><code>trans_table = string.maketrans(string.string.punctuation, ' '*len(string.punctuation)).lower() words = movie_plot.translate(trans_table).split() </code></pre> <p>(An afterthought) I cannot test it, but I think that if you store the result of this call in a variable</p> <pre><code>stops = stopwords.words('english') </code></pre> <p>or probably better - convert it into set first (if the function does not return one)</p> <pre><code>stops = set(stopwords.words('english')) </code></pre> <p>you'll get some improvement too</p> <p>(To answer your question in comment) Every function call consumes time; if you get large block of data than you don't utilize permanently - the waste of time may be huge As for set vs list - compare results:</p> <pre><code>In [49]: my_list = range(100) In [50]: %timeit 10 in my_list 1000000 loops, best of 3: 193 ns per loop In [51]: %timeit 101 in my_list 1000000 loops, best of 3: 1.49 us per loop In [52]: my_set = set(my_list) In [53]: %timeit 101 in my_set 10000000 loops, best of 3: 45.2 ns per loop In [54]: %timeit 10 in my_set 10000000 loops, best of 3: 47.2 ns per loop </code></pre> <p>While we are at greasy details - here are measurements for split vs. RE</p> <pre><code>In [30]: %timeit words = 'This is a long; and meaningless - sentence'.split(split_let) 1000000 loops, best of 3: 271 ns per loop In [31]: %timeit words = re.findall(r'\w+', 'This is a long; and meaningless - sentence', flags = re.UNICODE | re.LOCALE) 100000 loops, best of 3: 3.08 us per loop </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload