Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>“As I expect the file to be large, I did not use a HashMap to store the lines but opted instead for a String array.”</p> <p>I don’t agree with your reasoning; the larger the file, the more of a performance benefit you’ll get from hashing. In your code, you’re comparing each lines with <em>all</em> succeeding lines, requiring O(n²) computational complexity for the whole file.</p> <p>On the other hand, if you were to use an efficient hashing algorithm, then each hash lookup would complete in O(1); the computational complexity of processing your entire file becomes O(n).</p> <p>Try using a <code>HashSet&lt;string&gt;</code> and observe the difference in the processing time:</p> <pre><code>public static void RemoveDuplicateEntriesinFile(string filepath) { if (filepath == null) throw new ArgumentException("Please provide a valid FilePath"); HashSet&lt;string&gt; hashSet = new HashSet&lt;string&gt;(File.ReadLines(filepath)); File.WriteAllLines(filepath, hashSet); } </code></pre> <p><strong>Edit</strong>: Could you try the following version of the algorithm and check how long it takes? It’s optimized to minimize memory consumption:</p> <pre><code>HashAlgorithm hashAlgorithm = new SHA256Managed(); HashSet&lt;string&gt; hashSet = new HashSet&lt;string&gt;(); string tempFilePath = filepath + ".tmp"; using (var fs = new FileStream(tempFilePath, FileMode.Create, FileAccess.Write)) using (var sw = new StreamWriter(fs)) { foreach (string line in File.ReadLines(filepath)) { byte[] lineBytes = Encoding.UTF8.GetBytes(line); byte[] hashBytes = hashAlgorithm.ComputeHash(lineBytes); string hash = Convert.ToBase64String(hashBytes); if (hashSet.Add(hash)) sw.WriteLine(line); } } File.Delete(filepath); File.Move(tempFilePath, filepath); </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload