Note that there are some explanatory texts on larger screens.

plurals
  1. POComparing HUGE ASCII Files
    primarykey
    data
    text
    <p>I work for a company that does ETL work on various databases. I am tasked with creating a patch for two full historical data sets on the client machine, which would then be sent over to our servers. This patch needs to be programmatic so that it can be called from our software.</p> <p>The datasets are simple text files. We have extraction software running on our client's systems to perform the extraction. Extraction files range in size, up to 3GB+. I have implemented a solution using Microsoft's FC.exe, but it has limitations.</p> <p>I'm using FC to produce the comparison file, then parsing it in perl on our side to extract the records that have been removed/updated, and those that have been added.</p> <p>FC works perfectly fine for me as long as the line of text does not exceed 128 characters. When that happens the output is put on to the next line of the comparison file, and so appears as an added/deleted record. I know that I could probably pre-process the files, but this will add a tremendous amount of time, probably defeating the purpose.</p> <p>I have tried using diffutils, but it complains about large files.</p> <p>I also toyed with some c# code to implement the patch process myself. This worked fine for small files, but was horribly inefficient when dealing with the big ones (tested it on a 2.8 GB extract)</p> <p>Are there any good command-line utilities or c# libraries that I can utilize to create this patch file? Barring that, is there an algorithm that I can use to implement this myself? Keep in mind that records may be updated, added, and deleted (I know, it irks me too that the clients DELETE records, rather than marking them inactive. This is out of my control.)</p> <p><strong>Edit for clarity:</strong></p> <p>I need to compare two separate database extracts from two different times. Usually these will be about one day apart.</p> <p>Given the below files: (these will obviously be much longer and much wider)</p> <hr> <p><strong>Old.txt</strong></p> <pre><code>a b c d e 1 f 2 5 </code></pre> <p><strong>New.txt</strong></p> <pre><code>a 3 b c 4 d e 1 f g </code></pre> <p>The expected output would be:</p> <pre><code>3 added 4 added 2 removed g added 5 removed </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload