Note that there are some explanatory texts on larger screens.

plurals
  1. POReading large file in Java -- Java heap space
    text
    copied!<p>I'm reading a large tsv file (~40G) and trying to prune it by reading line by line and print only certain lines to a new file. However, I keep getting the following exception:</p> <pre><code>java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2894) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:532) at java.lang.StringBuffer.append(StringBuffer.java:323) at java.io.BufferedReader.readLine(BufferedReader.java:362) at java.io.BufferedReader.readLine(BufferedReader.java:379) </code></pre> <p>Below is the main part of the code. I specified the buffer size to be 8192 just in case. Doesn't Java clear the buffer once the buffer size limit is reached? I don't see what may cause the large memory usage here. I tried to increase the heap size but it didn't make any difference (machine with 4GB RAM). I also tried flushing the output file every X lines but it didn't help either. I'm thinking maybe I need to make calls to the GC but it doesn't sound right. </p> <p>Any thoughts? Thanks a lot. BTW - I know I should call trim() only once, store it, and then use it. </p> <pre><code>Set&lt;String&gt; set = new HashSet&lt;String&gt;(); set.add("A-B"); ... ... static public void main(String[] args) throws Exception { BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile),"UTF-8"), 8192); PrintStream output = new PrintStream(outputFile, "UTF-8"); String line = reader.readLine(); while(line!=null){ String[] fields = line.split("\t"); if( set.contains(fields[0].trim()+"-"+fields[1].trim()) ) output.println((fields[0].trim()+"-"+fields[1].trim())); line = reader.readLine(); } output.close(); } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload