Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>@yura, this isn't what you're looking for, but I don't think any clever algorithm will be able to consistently disambiguate whether queries like "soma ca" refer to Soma in San Fran or Soma Lake in Canada. The problem is not that your algorithm is not sophisticated enough; the problem is that there is simply not enough information in the query "soma ca".</p> <p>I don't know how to express it clearly, but there is an information theoretic thing going on here. It's like the way that random data can't be compressed losslessly: there's not enough information in the input to compute the desired output.</p> <p>Even if a human was to interpret your queries manually, they would not necessarily understand that "soma ca" is supposed to mean Soma in SF. Maybe to you a 2-letter abbreviation like "ca" "naturally" refers to a US state rather than a foreign country, but there is nothing fundamentally "correct" about that choice, and it cannot be derived using pure logic. It's an arbitrary, domain-specific, ad-hoc rule, just like the ad-hoc <code>log(population)</code> heuristic which you referred to.</p> <p>Some possible "solutions" (aside from designing a telepathic computer which can read users' minds):</p> <ol> <li>Provide users a list of possible matches for each query. Keep track of the ones they choose, and when other users later type the same query, order the results by popularity.</li> <li>OR, once you gather lots of data on the popularity of query results, you may even be able to mine the data with machine-learning algorithms, and derive better heuristics from it.</li> <li>Or, before putting the application into production use, you could first compile a body of fake queries, along with the results which you think your algorithm should yield for each such query. Then use your machine-learning algorithms on that.</li> <li>Compile a body of fake queries and desired responses, OR get the data from the choices of real users, and use that data to benchmark the accuracy of your manually designed and coded ranking heuristics. Keep inventing new heuristics until you find one which achieves high accuracy on your test data set.</li> </ol>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload