StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POFastest way to find relevant results in array from an input array
primarykey
Id
4841019
data
AcceptedAnswerId
4841073
AnswerCount
4
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2011-01-30T03:45:09.547
FavoriteCount
1
LastActivityDate
2016-09-17T16:32:02.343
LastEditDate
2016-09-17T16:32:02.343
LastEditorUserId
6599590
OwnerUserId
77011
ParentId
0
PostTypeId
1
Score
1
ViewCount
2681
LastEditorDisplayName
text
Body
As a mostly front-end developer, this is in the realm of computer science that I don't often delve into, but here's my scenario: I've got an input of a string, split on spaces, say <code>"pinto beans"</code> I've got a array of results to search, that contains results like: <code>["beans, mung","beans, pinto","beans, yellow","beans, fava"]</code> what might be the quickest way (preferably in javascript or php) to find the most "relevant" results, aka most matches, for instance, in the above case, I would like to sort the return array so that <code>"beans, pinto"</code> is put at the top, and the rest come below, and any other results would go below those. My first attempt at this would be to do something like matching each result item against each input item, and incrementing matches on each one, then sorting by most matches to least. This approach would require me to iterate through the entire result array a ton of times though, and I feel that my lack of CS knowledge is leaving me without the best solution here. /* EDIT: Here's how I ended up dealing with the problem: */ Based on crazedfred's suggestion and the blog post he mentioned (which was VERY helpful), I wrote some php that basically uses a combination of the trie method and the boyer-moore method, except searching from the beginning of the string (as I don't want to match "bean" in "superbean"). I chose php for the ranking based on the fact that I'm using js libraries, and getting real benchmarks while using convenience functions and library overhead wouldn't produce the testable results I'm after, and I can't guarantee that it won't explode in one browser or another. Here's the test data: Search String: <code>lima beans</code> Result array (from db): <code>["Beans, kidney","Beans, lima","Beans, navy","Beans, pinto","Beans, shellie","Beans, snap","Beans, mung","Beans, fava","Beans, adzuki","Beans, baked","Beans, black","Beans, black turtle soup","Beans, cranberry (roman)","Beans, french","Beans, great northern","Beans, pink","Beans, small white","Beans, yellow","Beans, white","Beans, chili","Beans, liquid from stewed kidney beans","Stew, pinto bean and hominy"]</code> First, I drop both the search string and the result array into php variables, after <code>explode()</code>ing the string on spaces. then, I precompile my patterns to compare the results to: <pre><code>$max = max(array_map('strlen',$input)); $reg = array(); for($m = 0; $m < $max; $m++) { $reg[$m] = ""; for($ia = 0; $ia < count($input); $ia++) { $reg[$m]. = $input[$ia][$m]; } } </code></pre> this gives me something like : <code>["lb","ie","ma","an","s"]</code> then, I basically take each result string (split on spaces), and match a case insensitive character class with the corresponding character number to it. If at any point during that comparison process I don't get any matches, I skip the word. This means if only 1 result starts with "b" or "l", I'll only run one comparison per WORD, which is really fast. Basically I'm taking the part of trie that compiles the searches together, and the constant speedup of the Boyer-Moore stuff. Here's the php - I tried <code>while</code>s, but got SIGNIFICANTLY better results with <code>foreach</code>es: <pre><code>$sort = array(); foreach($results as $result) { $matches = 0; $resultStrs = explode(' ', $result); foreach($resultStrs as $r) { $strlen = strlen($r); for($p = 0; $p < $strlen; $p++) { if($reg[$p]) preg_match('/^['.$reg[$p].']/i',$r[$p],$match); if($match==true) { $matches++; } else { break 2; } } } $sort[$result] = $matches; } </code></pre> That outputs an array with the results on the keys, and how many character matches we got in total on the values. The reason I put it that way is to avoid key collisions that would ruin my data, and more importantly, so I can do a quick <code>asort</code> and get my results in order. That order is in reverse, and on the keys, so after the above code block, I run: <pre><code>asort($sort); $sort = array_reverse(array_keys($sort)); </code></pre> That gives me a properly indexed array of results, sorted most to least relevant. I can now just drop that in my autocomplete box. Because speed is the whole point of this experiment, here's my results - obviously, they depend partially on my computer. 2 input words, 40 results: ~5ms 2 input words, (one single character, one whole) 126 results: ~9ms Obviously there's too many variables at stake for these results to mean much to YOU, but as an example, I think it's pretty impressive. If anyone sees something wrong with the above example, or can think of a better way than that, I'd love to hear about it. The only thing I can think of maybe causing problems right now, is if I were to search for the term <code>lean bimas</code>, I would get the same result score as <code>lima beans</code>, because the pattern isn't conditional based on the previous matches. Because the results I'm looking for and the input strings I'm expecting shouldn't make this happen very often, I've decided to leave it how it is for now, to avoid adding any overhead to this quick little script. However, if I end up feeling like my results are being skewed by it, I'll come back here and post about how I sorted that part.
Tags
<php><javascript><arrays><algorithm><search>
Title
Fastest way to find relevant results in array from an input array
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USJesse
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POFastest way to find relevant results in array from an input array
 UserUserId
 USJesse
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.