Note that there are some explanatory texts on larger screens.

plurals
  1. POSearching a normal query in an inverted index
    primarykey
    data
    text
    <p>I have a full inverted index in form of nested python dictionary. Its structure is : </p> <p><strong>{word : { doc_name : [location_list] } }</strong> </p> <p>For example let the dictionary be called index, then for a word " spam ", entry would look like : </p> <p>{ spam : { doc1.txt : [102,300,399], doc5.txt : [200,587] } }</p> <p>so that, the documents containing any word can be given by <em>index[word].keys()</em> , and frequency in that document by <em>len(index[word][document])</em> </p> <p>Now my question is, how do I implement a normal query search in this index. i.e. given a query containing lets say 4 words, find documents containing all four matches (ranked by total frequency of occurrence ), then docs containing 3 matches and so on ....</p> <p>**</p> <blockquote> <p>Added this code, using S. Lott's answer. This is the code I have written. Its working exactly as I want, ( just some formatting of output is needed ) but I know it could be improved. </p> </blockquote> <p>**</p> <pre><code>from collections import defaultdict from operator import itemgetter # Take input query = input(" Enter the query : ") # Some preprocessing query = query.lower() query = query.strip() # now real work wordlist = query.split() search_words = [ x for x in wordlist if x in index ] # list of words that are present in index. print "\nsearching for words ... : ", search_words, "\n" doc_has_word = [ (index[word].keys(),word) for word in search_words ] doc_words = defaultdict(list) for d, w in doc_has_word: for p in d: doc_words[p].append(w) # create a dictionary identifying matches for each document result_set = {} for i in doc_words.keys(): count = 0 matches = len(doc_words[i]) # number of matches for w in doc_words[i]: count += len(index[w][i]) # count total occurances result_set[i] = (matches,count) # Now print in sorted order print " Document \t\t Words matched \t\t Total Frequency " print '-'*40 for doc, (matches, count)) in sorted(result_set.items(), key = itemgetter(1), reverse = True): print doc, "\t",doc_words[doc],"\t",count </code></pre> <p>Pls comment .... Thanx.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload