Note that there are some explanatory texts on larger screens.

plurals
  1. POSearching attachments from a Rails app (Word, PDF, Excel etc)
    text
    copied!<p>My first post to Stack Overflow so be gentle please! I am about to start a new Ruby on Rails (3.1) project for a client. One of their requirements is that there is a search engine, which will be indexing roughly 2,000 documents which are a mixture of PDF, Word, Excel and HTML.</p> <p>I had hoped to use either thinking-sphinx or Texticle (most popular at <a href="https://www.ruby-toolbox.com/categories/rails_search.html" rel="nofollow">https://www.ruby-toolbox.com/categories/rails_search.html</a>) but as I understand it:</p> <ul> <li>Texticle requires PostgreSQL. I'm on MySQL.</li> <li>thinking-sphinx doesn't index files on the file system.</li> <li>even if I saved my attachments into the database, thinking-sphinx still wouldn't work as it requires plain text (according to <a href="http://groups.google.com/group/thinking-sphinx/browse_thread/thread/69cdc1c8e1c096ff" rel="nofollow">http://groups.google.com/group/thinking-sphinx/browse_thread/thread/69cdc1c8e1c096ff</a>)</li> </ul> <p>So I'm left with two options:</p> <ol> <li>Pick a different search tool</li> <li>Try to extract plain-text versions of the attachments into the database for thinking-sphinx to read</li> </ol> <p><strong>Which approach do you recommend?</strong></p> <p>If it's a different search tool, which one? My requirements are pretty basic so I'd really like one that's very easy to set up and has lots of documentation, examples and tutorials!</p> <p>If it's extracting, can you recommend extractors for common file types such as PDF, Word, Excel and HTML?</p> <p>Thanks everyone. Really appreciate your help.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload