Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I am not familiar with Amazon S3, but the general way to deal with searching remote files is to use <a href="http://en.wikipedia.org/wiki/Full_text_search#Indexing" rel="nofollow">indexing</a>, with the index itself being stored on the remote server. That way each search will use the index to deduce a relatively small number of potential matching files and only those will be scanned directly to verify if they are indeed a match or not. Depending on your search terms and the complexity of the pattern, it might even be possible to avoid the direct file scan altogether.</p> <p>That said, I do not know whether Amazon S3 has an indexing engine that you can use or whether there are supplemental libraries that do that for you, but the concept is simple enough that you should be able to get something working by yourself without too much work.</p> <p><strong>EDIT:</strong></p> <p>Generally the tokens that exist in each file are what is indexed. For example if you want to search for <em>"foo bar"</em> the index will tell you which files contain <em>"foo"</em> and which contain <em>"bar"</em>. The cross-section of these results will be the files that contain both <em>"foo"</em> and <em>"bar"</em>. You will have to scan those files directly to select those (if any) where <em>"foo"</em> and <em>"bar"</em> are right next to each other in the right order.</p> <p>In any case, the amount of data that is downloaded to the client would be far less than downloading and scanning everything, although that would also depend on how your files are structured and what your search patterns look like.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload