Note that there are some explanatory texts on larger screens.

plurals
  1. POUse superCSV to read a large text file of 80GB
    text
    copied!<p>I want to read a huge csv file. We are using superCSV to parse through the files in general. In this particular scenario, the file is huge and there is always this problem of running out of memory for obvious reasons. </p> <p>The initial idea is to read the file as chunks, but I am not sure if this would work with superCSV because when I chunk the file, only the first chunk has the header values and will be loaded into the CSV bean, while the other chunks do not have header values and I feel that it might throw an exception. So </p> <p>a) I was wondering if my thought process is right<br> b) Are there any other ways to approach this problem.</p> <p>So my main question is </p> <p>Does superCSV have the capability to handle large csv files and I see that superCSV reads the document through the BufferedReader. But I dont know what is the size of the buffer and can we change it as per our requirement ?</p> <p>@Gilbert Le BlancI have tried splitting into smaller chunks as per your suggestion but it is taking a long time to break down the huge file into smaller chunks. Here is the code that I have written to do it.</p> <pre><code>import java.io.BufferedWriter; import java.io.File; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.io.LineNumberReader; public class TestFileSplit { public static void main(String[] args) { LineNumberReader lnr = null; try { //RandomAccessFile input = new RandomAccessFile("", "r"); File file = new File("C:\\Blah\\largetextfile.txt"); lnr = new LineNumberReader(new FileReader(file), 1024); String line = ""; String header = null; int noOfLines = 100000; int i = 1; boolean chunkedFiles = new File("C:\\Blah\\chunks").mkdir(); if(chunkedFiles){ while((line = lnr.readLine()) != null) { if(lnr.getLineNumber() == 1) { header = line; continue; } else { // a new chunk file is created for every 100000 records if((lnr.getLineNumber()%noOfLines)==0){ i = i+1; } File chunkedFile = new File("C:\\Blah\\chunks\\" + file.getName().substring(0,file.getName().indexOf(".")) + "_" + i + ".txt"); // if the file does not exist create it and add the header as the first row if (!chunkedFile.exists()) { file.createNewFile(); FileWriter fw = new FileWriter(chunkedFile.getAbsoluteFile(), true); BufferedWriter bw = new BufferedWriter(fw); bw.write(header); bw.newLine(); bw.close(); fw.close(); } FileWriter fw = new FileWriter(chunkedFile.getAbsoluteFile(), true); BufferedWriter bw = new BufferedWriter(fw); bw.write(line); bw.newLine(); bw.close(); fw.close(); } } } lnr.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { } } } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload