Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to speed up comparing MD5 hashes in a database
    primarykey
    data
    text
    <p>I have a database full of classified adverts for computers for sale that have come from many different sites. The database is populated by XML files that are received from individual sites advertising, and then parsed and placed in a central table in the database. </p> <p>The XML files have the following attributes for each computer: Make, Model, HD Size, RAM Size, Processor Speed, Price, Location etc.</p> <p>The central database table then has the same columns, plus an extra one at the start which is an ID field for our own use.</p> <p>Because the adverts are created by the public, they often place the adverts on one or more of our partner sites, therefore creating duplicate adverts advertising the same computer, and I need to identify the adverts that are duplicated in the database. </p> <p>The problem with identifying the adverts is that there is no unique field (unlike, say, a car with a Reg Number). </p> <p>An idea I've had is to add an extra column to the table that holds an MD5 hash of the contents of the other columns. When the XML is parsed, the MD5 hash is created of each advert and then added as a final column. </p> <p>Once the records have been added (100k upwards) to the table, a query is run to identify any matching MD5 hashes but this takes too long, and often times out the query (even if the timeout has been extended)</p> <p>My question then is: Is the MD5 hash route the best one? If so, how would I speed the querying up? If not, what would be the best way of identifying the duplicate adverts?</p> <p>Thanks,</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload