Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I can only talk about MS Office documents here. There are several ways to do this:</p> <ul> <li>Using COM automation</li> <li>Using converters which output the document in a more accessible format</li> <li>Using 3rd-party libraries</li> <li>Using Microsoft's OpenXML SDK</li> </ul> <p>COM automation has the disadvantage of not always being reliable, mainly because applications tend to hang due to modal popup dialogs.</p> <p>Converters are available for Word. You could check out the Text Converter SDK available from Microsoft which would allow you to use the document converters coming with Word in a stand-alone application. Requires some C coding but since you are using the same conversion engines as Office you will get high-fidelity results. The SDK can be obtained from <a href="http://support.microsoft.com/kb/111716" rel="nofollow noreferrer">http://support.microsoft.com/kb/111716</a>.</p> <p>For the third option using third party libraries you might want to have a look at Apache POI or the <a href="http://b2xtranslator.sourceforge.net/" rel="nofollow noreferrer">b2xtranslator project</a> on SourceForge. The latter provides a C# library which allows you to extract the text from binary Word documents. PowerPoint development is still in an early stadium but text extraction should already be working.</p> <p>The last option would be to use Microsoft's OpenXML SDK. This might be the preferred/easiest way. Search Google for samples. You could also handle binary documents by first converting them using the Office Compatibility Pack (download and install from Microsoft):</p> <p>Word:</p> <pre><code>"C:\Program Files\Microsoft Office\Office12\wordconv.exe" -oice -nme &lt;input file&gt; &lt;output file&gt; </code></pre> <p>Excel:</p> <pre><code>"C:\Program Files\Microsoft Office\Office12\excelcnv.exe" -oice &lt;input file&gt; &lt;output file&gt; </code></pre> <p>PowerPoint:</p> <pre><code>"C:\Program Files\Microsoft Office\Office12\ppcnvcom.exe" -oice &lt;input file&gt; &lt;output file&gt; </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload