Note that there are some explanatory texts on larger screens.

plurals
  1. POJava- Parsing a large text file
    text
    copied!<p>I had a quick question. I'm working on a school project and I need to parse an extremely large text file. It's for a database class, so I need to get unique actor names from the file because actors will be a primary key in the mysql database. I've already written the parser and it works great, but at the time I forgot to remove the duplicates. So, I decided the easiest way would be to create an actors arraylist. (Using ArrayList ADT) Then use the contain() method to check if the actor name is in the arraylist before I print it to a new text file. If it is I do nothing, if it isn't I add it to the arraylist and print to the page. Now the program is running extremely slow. Before the arraylist, it took about 5 minutes. The old actor file was 180k without duplicates removed. Now its been running for 30 minutes and at 12k so far. (I'm expecting 100k-150k total this time.)</p> <p>I left the size of the arraylist blank because I dont know how many actors are in the file, but at least 1-2 million. I was thinking of just putting 5 million in for its size and checking to see if it got them all after. (Simply check last arraylist index and if empty, it didnt run out of space.) Would this reduce time because the arraylist isnt redoubling constantly and recopying everything over? Is there another method which would be faster than this? I'm also concerned my computer might run out of memory before it completes. Any advice would be great.</p> <p>(Also I did try running 'unique' command on the text file without success. The actor names print out 1 per line. (in one column) I was thinking maybe the command was wrong. How would you remove duplicates from a text file column in a windows or linux command prompt?) Thank you and sorry for the long post. I have a midterm tomorrow and starting to get stressed.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload