Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to fix java OutOfMemoryError: Java heap space from DataImportHandler?
    primarykey
    data
    text
    <p>I am trying to import a large dataset (41million records) into a new Solr index. I have setup the core, it works, I inserted some test docs, they work. I have setup the data-config.xml as below and then I start the full-import. After about 12 hours! the import fails.</p> <p>The document size can get quite large, could the error be because of a large document (or field) or due to the volume of data going into the DataImportHandler?</p> <p>How can I get this frustrating import task working!?!</p> <p>I have included the tomcat error log below.</p> <p>Let me know if there is any info i have missed!</p> <p>logs:</p> <pre><code>Jun 1, 2011 5:47:55 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity results with URL: jdbc:sqlserver://myserver;databaseName=mydb;responseBuffering=adaptive;selectMethod=cursor Jun 1, 2011 5:47:56 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 1185 Jun 1, 2011 5:48:02 PM org.apache.solr.core.SolrCore execute INFO: [results] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0 ... Jun 2, 2011 5:16:32 AM org.apache.solr.common.SolrException log SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:664) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392) Caused by: java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringDecoder.decode(Unknown Source) at java.lang.StringCoding.decode(Unknown Source) at java.lang.String.&lt;init&gt;(Unknown Source) at java.lang.String.&lt;init&gt;(Unknown Source) at com.microsoft.sqlserver.jdbc.DDC.convertStreamToObject(DDC.java:419) at com.microsoft.sqlserver.jdbc.ServerDTVImpl.getValue(dtv.java:1974) at com.microsoft.sqlserver.jdbc.DTV.getValue(dtv.java:175) at com.microsoft.sqlserver.jdbc.Column.getValue(Column.java:113) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:1982) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:1967) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject(SQLServerResultSet.java:2256) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject(SQLServerResultSet.java:2265) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:286) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:228) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.next(JdbcDataSource.java:266) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.next(JdbcDataSource.java:260) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:78) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) ... 5 more Jun 2, 2011 5:16:32 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jun 2, 2011 5:16:44 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback </code></pre> <p>data-config.xml:</p> <pre><code>&lt;dataConfig&gt; &lt;dataSource type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://myserver;databaseName=mydb;responseBuffering=adaptive;selectMethod=cursor" user="sa" password="password"/&gt; &lt;document&gt; &lt;entity name="results" query="SELECT fielda, fieldb, fieldc FROM mydb.[dbo].mytable WITH (NOLOCK)"&gt; &lt;field column="fielda" name="fielda"/&gt;&lt;field column="fieldb" name="fieldb"/&gt;&lt;field column="fieldc" name="fieldc"/&gt; &lt;/entity&gt; &lt;/document&gt; &lt;/dataConfig&gt; </code></pre> <p>solrconfig.xml snippet:</p> <pre><code>&lt;indexDefaults&gt; &lt;useCompoundFile&gt;false&lt;/useCompoundFile&gt; &lt;mergeFactor&gt;25&lt;/mergeFactor&gt; &lt;ramBufferSizeMB&gt;128&lt;/ramBufferSizeMB&gt; &lt;maxFieldLength&gt;100000&lt;/maxFieldLength&gt; &lt;writeLockTimeout&gt;10000&lt;/writeLockTimeout&gt; &lt;commitLockTimeout&gt;10000&lt;/commitLockTimeout&gt; &lt;/indexDefaults&gt; &lt;mainIndex&gt; &lt;useCompoundFile&gt;false&lt;/useCompoundFile&gt; &lt;ramBufferSizeMB&gt;128&lt;/ramBufferSizeMB&gt; &lt;mergeFactor&gt;25&lt;/mergeFactor&gt; &lt;infoStream file="INFOSTREAM.txt"&gt;true&lt;/infoStream&gt; &lt;/mainIndex&gt; </code></pre> <p>Java config settings: init mem 128mb, max 512mb</p> <p>Environment: solr 3.1 tomcat 7.0.12 windows server 2008 java: v6 update 25 (build 1.6.0_25-b06) (data coming from:sql 2008 r2)</p> <pre><code>/admin/stats.jsp - DataImportHandler Status : IDLE Documents Processed : 2503083 Requests made to DataSource : 1 Rows Fetched : 2503083 Documents Deleted : 0 Documents Skipped : 0 Total Documents Processed : 0 Total Requests made to DataSource : 0 Total Rows Fetched : 0 Total Documents Deleted : 0 Total Documents Skipped : 0 handlerStart : 1306759913518 requests : 9 errors : 0 </code></pre> <p>EDIT: I am currently running a sql query to find out the largest single record's field length, as I think this is probably cause of exception. Also, running import again with jconsole to monitor heap usage.</p> <p>EDIT: Read <a href="http://wiki.apache.org/solr/SolrPerformanceFactors#Factors_affecting_memory_usage" rel="nofollow">solr performance factors page</a>. changing maxFieldLength to 1000000 and changing ramBufferSizeMB = 256. Now for another import run (yay...)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload