Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Please update the question, do you have a list of CompanyNames available to you? I ask because you maybe able to use Levenshtein algo to find a relationship between your list of CompanyNames and LocationNames.</p> <hr> <p><strong>Update</strong></p> <blockquote> <p>There is not a list of Company Names, I will have to generate the company name from the most descriptive or best Location Name that represents the multiple locations.</p> </blockquote> <p>Okay... try this:</p> <ol> <li>Build a list of candidate CompanyNames by finding LocationNames made up of mostly or all alphabetic characters. You can use <a href="http://us2.php.net/manual/en/book.pcre.php" rel="nofollow noreferrer">regular expressions</a> for this. Store this list in a separate table.</li> <li>Sort that list alphabetically and (manually) determine which entries should be CompanyNames.</li> <li>Compare each CompanyName to each LocationName and come up with a match score (use <a href="http://us2.php.net/manual/en/function.levenshtein.php" rel="nofollow noreferrer">Levenshtein</a> or some other string matching algo). Store the result in a separate table.</li> <li>Set a threshold score such that any MatchScore &lt; Threshold will not be considered a match for a given CompanyName.</li> <li>Manually vet through the LocationNames by CompanyName | LocationName | MatchScore, and figure out which ones actually match. Ordering by MatchScore should make the process less painful.</li> </ol> <p>The whole purpose of the above actions is to automate parts and limit the scope of your problem. It's far from perfect, but will hopefully save you the trouble of going through 18K records by hand.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload