Note that there are some explanatory texts on larger screens.

plurals
  1. POHow does one instruct the ExtractingRequestHandler to parse only the body of a document?
    text
    copied!<p>How can I instruct the extracting request handler to ignore metadata/headers etc. when it constructs the "content" of the document I send to it?</p> <p>For example, I created an MS Word document containing just the word "SEARCHWORD" and nothing else. However, when I ship this doc to my solr index, its contents are mapped to my "body" field as follows:</p> <pre><code>&lt;str name="body"&gt; Last-Printed 2009-02-05T15:02:00Z Revision-Number 22 Comments stream_source_info myfile Last-Author Inigo Montoya Template Normal.dotm Page-Count 1 subject Application-Name Microsoft Macintosh Word Author Jesus Baggins Word-Count 2 xmpTPg:NPages 1 Edit-Time 108600000000 Creation-Date 2008-11-05T20:19:00Z stream_content_type application/octet-stream Character Count 14 stream_size 31232 stream_name /Applications/MAMP/tmp/php/phpHCIg7y Some Company Content-Type application/msword Keywords Last-Save-Date 2012-05-01T18:55:00Z SEARCHWORD &lt;/str&gt; </code></pre> <p>All I want is the body of the document, in this case the word "SEARCHWORD."</p> <p>For further reference, here's my extraction handler:</p> <pre><code> &lt;requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler" &gt; &lt;lst name="defaults"&gt; &lt;!-- All the main content goes into "text"... if you need to return the extracted text or do highlighting, use a stored field. --&gt; &lt;str name="fmap.content"&gt;body&lt;/str&gt; &lt;str name="lowernames"&gt;true&lt;/str&gt; &lt;str name="uprefix"&gt;ignored_&lt;/str&gt; &lt;/lst&gt; &lt;/requestHandler&gt; </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload