StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSolr(Lucene) is indexing only the first document after adding a custom TokenFilter
primarykey
Id
7618513
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2011-10-01T06:08:34.217
FavoriteCount
1
LastActivityDate
2013-09-05T07:55:38.210
LastEditDate
2011-10-01T07:12:30.450
LastEditorUserId
256400
OwnerUserId
256400
ParentId
0
PostTypeId
1
Score
5
ViewCount
477
LastEditorDisplayName
text
Body
I created a custom token filter which concatenates all the tokens in the stream. This is my <code>incrementToken()</code> function <pre><code>public boolean incrementToken() throws IOException { if (finished) { logger.debug("Finished"); return false; } logger.debug("Starting"); StringBuilder buffer = new StringBuilder(); int length = 0; while (input.incrementToken()) { if (0 == length) { buffer.append(termAtt); length += termAtt.length(); } else { buffer.append(" ").append(termAtt); length += termAtt.length() + 1; } } termAtt.setEmpty().append(buffer); //offsetAtt.setOffset(0, length); finished = true; return true; } </code></pre> I added the new Filter to the end of index and query analysis chain for a field and testing the filter from <a href="http://localhost:8983/solr/admin/analysis.jsp" rel="nofollow">http://localhost:8983/solr/admin/analysis.jsp</a> seems to be working. The filter is concatenating the tokens in the stream. But on re-indexing the documents only my first document is getting indexed. This is how my filter chain looks like. <pre><code> <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[-_]" replacement=" " /> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" /> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StopWordFilterFactory" ignoreCase="true" words="words.txt" /> <filter class="org.custom.solr.analysis.ConcatFilterFactory" /> </analyzer> <analyzer type="query"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[-_]" replacement=" " /> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" /> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StopWordFilterFactory" ignoreCase="true" words="words.txt" /> <filter class="org.custom.solr.analysis.ConcatFilterFactory" /> </analyzer> </code></pre> Without the <code>ConcatFilterFactory</code> all words are getting indexed properly but with <code>ConcatFilterFactory</code> only the first document is getting indexed. What am I doing wrong? Kindly help me in understanding the problem. UPDATE : Finally figured out the issue. <pre><code>if (finished) { logger.debug("Finished"); finished = false; return false; } </code></pre> Looks like the same class is being reused. Makes sense.
Tags
<search><lucene><solr><tokenize>
Title
Solr(Lucene) is indexing only the first document after adding a custom TokenFilter
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USJithin
UserOwnerUserId
1. USJithin
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POSolr(Lucene) is indexing only the first document after adding a custom TokenFilter
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POSolr(Lucene) is indexing only the first document after adding a custom TokenFilter
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POSolr(Lucene) is indexing only the first document after adding a custom TokenFilter
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COYou should post your own answer and mark it as accepted. This question still shows up as one of the top unanswered questions for Lucene.
 singulars
 PostPostId
 POSolr(Lucene) is indexing only the first document after adding a custom TokenFilter
 UserUserId
 USMartin Blech
2. COI worked on this years back and now I can't remember what exactly I did for the fix. :(
 singulars
 PostPostId
 POSolr(Lucene) is indexing only the first document after adding a custom TokenFilter
 UserUserId
 USJithin

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.