Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Your use of <code>scan</code> creates an array, counts the size of it, then throws it away. If you have a lot of occurrences of the substring inside a big file, you will create a big array temporarily, potentially burning up CPU time with memory management, but that should still run pretty quickly, even with 300MB.</p> <p>Because Word is an ActiveRecord class, it is dependent on the schema and any indexes in your database, plus any issues your database server might be having. If the database is not optimized or is responding slowly or the query used to retrieve the data is not efficient, then the iteration will be slow. You might find it a lot faster to grab groups of <code>Word</code> so they are in RAM, then iterate over them. </p> <p>And, if the database and your code are running on the same machine, you could be suffering from resource constraints like having only one drive, not enough RAM, etc.</p> <p>Without knowing more about your environment and hardware it's hard to say.</p> <hr> <p>EDIT: </p> <blockquote> <p>I can grab the substrings into an array/hash first, then add the count results to the array or hash, and write the results back to database after all the counting is done. You think it be faster, right?</p> </blockquote> <p>No, I doubt that will help a lot, and, without knowing where the problem lies all you might do is make the problem worse because you'll have to load 10,000 records as objects from the database, then build a 10,000 element hash or array which will also be in memory along with the DB records, then write them out.</p> <p>Ruby will only use a single core, currently, but you can gain speed by using Ruby 1.9+. I'd recommend <a href="http://rvm.beginrescueend.com/rvm/install" rel="nofollow noreferrer">installing RVM</a> and letting it manage your Ruby. Be sure to read the instructions on that page, then run <code>rvm notes</code> and follow those directions.</p> <p>What is your Word model and the underlying schema and indexes look like? Is the database on the same machine? </p> <hr> <p>EDIT: From looking at your table schema, you have no indexes except for <code>id</code> which really won't help much for normal look-ups. I'd recommend presenting your schema on Stack Overflow's sibling site <a href="https://dba.stackexchange.com/">https://dba.stackexchange.com/</a> and explain what you want to do. At a minimum I'd add a key to the text fields to help avoid full table scans for any searches you do.</p> <p>What might help more is to read: <a href="http://guides.rubyonrails.org/active_record_querying.html#retrieving-multiple-objects" rel="nofollow noreferrer">Retrieving Multiple Objects in Batches</a> from "Active Record Query Interface".</p> <p>Also, look at the SQL being emitted when your <code>Word.each</code> is running. Is it something like <code>"select * from word"</code>? If so, Rails is pulling in 10,000 records to iterate over them one by one. If it is something like <code>"select * from word where id=1"</code> then for every record you have a database read followed by a write when you update the count. That is the scenario that the "Retrieving Multiple Objects in Batches" link will help fix.</p> <p>Also, I am guessing that <code>content</code> is the text you are searching for, but I can't tell for sure. Is it possible you have duplicated text values causing you to do scans more than once for the same text? If so, select your records using a <code>unique</code> condition on that field and then update your counts for all matching records at one time.</p> <p>Have you profiled your code to see if Ruby itself can help you pinpoint the problem? Modify your code a little to process 100 or 1000 records. Start the app with the <code>-r profile</code> flag. When the app exits profiler will output a table showing where time was spent.</p> <p>What version of Rails are you running?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload