StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POFull text indexer with line level results, substring searches, and incremental update support?
primarykey
Id
420436
data
AcceptedAnswerId
0
AnswerCount
3
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2009-01-07T14:15:43.880
FavoriteCount
1
LastActivityDate
2009-01-14T20:53:03.223
LastEditDate
2009-01-09T14:10:03.553
LastEditorUserId
33531
OwnerUserId
33531
ParentId
0
PostTypeId
1
Score
2
ViewCount
347
LastEditorDisplayName
Nathan Neulinger
text
Body
I'm looking for a full text indexing package that is being maintained (i.e. not an end of life dead package) that can would ideally have support for: <ul> <li>substring matches</li> <li>incremental updates</li> <li>line level results</li> </ul> Also ideal would be support for <ul> <li>boolean matches</li> <li>adjacency searches "stringX found near stringY"</li> </ul> A little more detail about the situation - I currently have a 'grep on steroids' that searches through system log files stored in a central location, split by host and day, updated continuously. <ul> <li>approximately 40-80 GB of mixed compressed and raw files</li> <li>raw uncompressed data size - 350 - 500 GB</li> <li>20,000+ files</li> </ul> A solution like <a href="http://www.splunk.com" rel="nofollow noreferrer">Splunk</a> would be ideal, but pricing for our data change rate (2-4GB/day) - even with educational organization pricing - is outrageously high. I have used <a href="http://www.is.informatik.uni-duisburg.de/projects/freeWAIS-sf/" rel="nofollow noreferrer">freeWAIS-sf</a> in the past, and am currently using <a href="http://www.namazu.org" rel="nofollow noreferrer">namazu</a> for limited indexing of a small document set elsewhere. I don't require spidering support, I can feed it a list of files to index and they will all be on local disk. Problem is - freeWAIS-sf appears to essentially be abandoned, and namazu doesn't have any line-level results - only by-file. Any suggestions for products to use? One option I did consider was to use something like namazu, but to split the files before indexing into chunks and post-process search results to reassemble, but that seems very hackish. EDIT I'm open to building multiple indexes as well as a way of doing incremental updates - even though I'd have to aggregate the multiple search results. I can also live with a delay on indexing for 'Todays' results, indexing doesn't have to be real-time. EDIT Solr appears to be quite useful as a tool, however, it looks to have the same issue as using namazu or the others - if I want file level positions of the results - I basically have to do it myself externally - or pre-split the file into chunks as I generate the XML to load into the index server. While this does provide a very structured way of doing it, if I have to do all that myself, it's going back to the starting point.
Tags
<logging><data-structures><indexing><full-text-search><text-files>
Title
Full text indexer with line level results, substring searches, and incremental update support?
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USNathan Neulinger
UserOwnerUserId
1. USNathan Neulinger
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POFull text indexer with line level results, substring searches, and incremental update support?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POFull text indexer with line level results, substring searches, and incremental update support?
 UserUserId
 USoffby1
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.