Note that there are some explanatory texts on larger screens.

plurals
  1. POJava-mysql highload application crash
    primarykey
    data
    text
    <p>I have a problem with my html-scraper. Html-scraper is multithreading application written on Java using HtmlUnit, by default it run with 128 threads. Shortly, it works as follows: it takes a site url from big text file, ping url and if it is accessible - parse site, find specific html blocks, save all url and blocks info including html code into corresponding tables in database and go to the next site. Database is mysql 5.1, there are 4 InnoDb tables and 4 views. Tables have numeric indexes for fields used in table joining. I also has a web-interface for browsing and searching parsed data (for searching I use Sphinx with delta indexes), written on CodeIgniter.</p> <p>Server configuration:</p> <pre><code>CPU: Type Xeon Quad Core X3440 2.53GHz RAM: 4 GB HDD: 1TB SATA OS: Ubuntu Server 10.04 </code></pre> <p>Some mysql config: </p> <pre><code>key_buffer = 256M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 128 max_connections = 400 table_cache = 64 query_cache_limit = 2M query_cache_size = 128M </code></pre> <p>Java machine run with default parameters except next options: <pre>-Xms1024m -Xmx1536m -XX:-UseGCOverheadLimit -XX:NewSize=500m -XX:MaxNewSize=500m -XX:SurvivorRatio=6 -XX:PermSize=128M -XX:MaxPermSize=128m -XX:ErrorFile=/var/log/java/hs_err_pid_%p.log </pre></p> <p>When database was empty, scrapper process 18 urls in second and was stable enough. But after 2 weaks, when urls table contains 384929 records (~25% of all processed urls) and takes 8.2Gb, java application begun work very slowly and crash every 1-2 minutes. I guess the reason is mysql, that can not handle growing loading (parser, which perform <code>2+4*BLOCK_NUMBER</code> queries every processed url; sphinx, which updating delta indexes every 10 minutes; I don't consider web-interface, because it's used by only one person), maybe it rebuild indexes very slowly? But mysql and scraper logs (which also contain all uncaught exceptions) are empty. What do you think about it?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload