Note that there are some explanatory texts on larger screens.

plurals
  1. POFull text indexer with line level results, substring searches, and incremental update support?
    primarykey
    data
    text
    <p>I'm looking for a full text indexing package that is being maintained (i.e. not an end of life dead package) that can would ideally have support for:</p> <ul> <li>substring matches</li> <li>incremental updates</li> <li>line level results</li> </ul> <p>Also ideal would be support for</p> <ul> <li>boolean matches</li> <li>adjacency searches "stringX found near stringY"</li> </ul> <p>A little more detail about the situation - I currently have a 'grep on steroids' that searches through system log files stored in a central location, split by host and day, updated continuously.</p> <ul> <li>approximately 40-80 GB of mixed compressed and raw files</li> <li>raw uncompressed data size - 350 - 500 GB</li> <li>20,000+ files</li> </ul> <p>A solution like <a href="http://www.splunk.com" rel="nofollow noreferrer">Splunk</a> would be ideal, but pricing for our data change rate (2-4GB/day) - even with educational organization pricing - is outrageously high.</p> <p>I have used <a href="http://www.is.informatik.uni-duisburg.de/projects/freeWAIS-sf/" rel="nofollow noreferrer">freeWAIS-sf</a> in the past, and am currently using <a href="http://www.namazu.org" rel="nofollow noreferrer">namazu</a> for limited indexing of a small document set elsewhere.</p> <p>I don't <em>require</em> spidering support, I can feed it a list of files to index and they will all be on local disk.</p> <p>Problem is - freeWAIS-sf appears to essentially be abandoned, and namazu doesn't have any line-level results - only by-file.</p> <p>Any suggestions for products to use? One option I did consider was to use something like namazu, but to split the files before indexing into chunks and post-process search results to reassemble, but that seems very hackish.</p> <p><strong>EDIT</strong></p> <p>I'm open to building multiple indexes as well as a way of doing incremental updates - even though I'd have to aggregate the multiple search results. </p> <p>I can also live with a delay on indexing for 'Todays' results, indexing doesn't have to be real-time. </p> <p><strong>EDIT</strong></p> <p>Solr appears to be quite useful as a tool, however, it looks to have the same issue as using namazu or the others - if I want file level positions of the results - I basically have to do it myself externally - or pre-split the file into chunks as I generate the XML to load into the index server. While this does provide a very structured way of doing it, if I have to do all that myself, it's going back to the starting point.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload