StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I do the same thing for a project I work on. My suggestion would be that loading an entire book as a single field isn't a great idea unless you're only ever going to work with one book, rather than many books. Here's how I do it.</p> <ol> <li>Book is stored in MySQL database one page at a time.</li> <li>Run sphinx across database with several million pages of text - works very fast, returns every page with the text you are looking for (or depending on the number of pages in the DB, just get the first 30 or whatever).</li> <li>Use Excerpt Builder to get an excerpt from a page, and then highlight the search phase.</li> <li>If Python doesn't have access to the excerpt builder (it may be php only), then you could do the same job without too much difficulty using regular expressions - you just need to find your search phrase and do a regex to find so much text either side, and another regex to add highlighting.</li> </ol> <p>You could write a python script (I use a PHP script run from the bash shell) to extract your text one page at a time, sanitize it, and add it to the database.</p> <p>You'd need a database with at least two tables something like</p> <p><code>books (fields could be called, id, name, author)</code></p> <p><code>pages (fields would be id, book_id, page_text)</code></p> <p>Sphinx would return you a page id, you then get the page from MySQL using a simple query...</p> <p><code>SELECT page_text FROM pages WHERE id = $idreturnedbysphinx;</code></p> <p>You then send that returned text to the text excerpter/text highlighter.</p> <p>Sphinx can either search for exact words or stemmed words (and much much more), but you need to set this up in your sphinx.conf file.</p> <p>You need at least two index definitions:</p> <pre><code>indexer indexname1 { #source database connection and sql query source = src1 path = /var/data/indexname1 [... other settings ...] #make sure stemming is switched off morphology = none } #child index inherits the above, and add stemming index indexname1stemmed : indexname1 { path = /var/data/indexname1stemmed morphology = stem_en index_exact_words = 1 } </code></pre> <p>You then also need to specify in your sphinx search the match mode you want to use. I don't know the python syntax, but the sphinx manual sets it out better than I can: <a href="http://sphinxsearch.com/docs/current.html#matching-modes" rel="nofollow">http://sphinxsearch.com/docs/current.html#matching-modes</a></p> <p>You could do all this without a SQL database and keep it in text files, but I'd probably go to one text file per page as a more manageable way to work, otherwise you'll be back to returning the entire ebook as your search result.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload