Note that there are some explanatory texts on larger screens.

plurals
  1. POSelect all similar rows that are possible duplicates using LIKE?
    text
    copied!<p>After importing information about songs into my SQLite database, I want to use a SELECT statement to find all possible duplicate songs using this criteria:</p> <p>songName in one row similar or equal to the songName in any other row in the same table (Songs) AND artistID is the same in both rows. This should work without knowing the contents of songName. If I wanted to compare a known song name to all others in the database that can be done using "songName LIKE '%known name%'", but how do I find all duplicates without this?</p> <p><strong>Example Songs Table:</strong></p> <pre><code>id songName artistID duration -------------------------------------------- 0 This is a song 5 3:43 1 Another song 3 3:23 2 01-This is a song 5 3:42 3 song 4 4:01 4 song 4 6:33 5 Another record 2 2:45 </code></pre> <p><strong>Expected Results:</strong></p> <pre><code>id songName artistID duration -------------------------------------------- 0 This is a song 5 3:43 2 01-This is a song 5 3:42 3 song 4 4:01 4 song 4 6:33 </code></pre> <p><strong>EDIT:</strong></p> <p>Since the idea of creating a hash and comparing them has been suggested, I'm thinking about using this psuedo-function to create a hash for each song name:</p> <pre><code>Public Function createHash(ByVal phrase As String) As String 'convert to lower case phrase = LCase(phrase) 'split the phrase into words Dim words() As String = phrase.Replace("_", " ").Split(" ") Dim hash As String = "" For w = 0 To words.Count - 1 'remove noise words (a, an, the, etc.) words(w) = removeNoiseWords(words(w)) 'convert 1 or 2-digit numbers to corresponding words words(w) = number2word(words(w)) Next 'rebuild using replaced words and remove spaces hash = String.Join("", words) 'convert upper ascii into alphabetic (ie. ñ = n, Ö = O, etc.) hash = removeUnsupChars(hash, True) 'strip away all remaining non-alphanumeric characters hash = REGEX_Replace(hash, "[^A-Za-z0-9]", "") Return hash End Function </code></pre> <p>Once the hash is computed, I'll store that with each record and then select duplicates using a count(hash)>1. I'll then use .NET code to see if the artistID is the same for the returned records.</p> <p>This solution seems to be working fine so far. Here is the SQLite statement I use to find the duplicate songs:</p> <pre><code>SELECT count(*),hash from Songs GROUP BY hash HAVING count(hash) &gt; 1 ORDER BY hash; </code></pre> <p>This gives me a list of all the hashes that occur more than once. I store these results in an array, then cycle through the array and simply use this statement to get the detailed info:</p> <pre><code> For i = 0 To dupeHashes.Count - 1 SQLconnect.Open() SQLcommand = SQLconnect.CreateCommand SQLcommand.CommandText = "SELECT * from Songs WHERE hash = '" &amp; dupeHashes(i) &amp; "';" SQLreader = SQLcommand.ExecuteReader() While SQLreader.Read() 'get whatever data needed for each duplicate song End While SQLcommand.Dispose() SQLconnect.Close() Next </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload