Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>For remote access to Tika, there are basically two methods available. One is the <a href="https://wiki.apache.org/tika/TikaJAXRS" rel="nofollow">Tika JAXRS Server</a>, which provides a full RESTful interface. The other is the simple <a href="https://issues.apache.org/jira/browse/TIKA-593" rel="nofollow">Tika-App --server mode</a>, which just works at a network pipe level.</p> <p>For production use, you'll probably want to use the Tika JAXRS server, as it's more fully featured. For simple testing and getting started, the Tika App in Server mode ought to be fine</p> <p>For the latter, just connect to the port that you're running the Tika-App on, stream it your document data, and read your html back. For example, in one terminal run</p> <pre><code>$ java -jar tika-app-1.3.jar --server --port 1234 </code></pre> <p>Then, in another, do</p> <pre><code>$ nc 127.0.0.1 1234 &lt; test.pdf </code></pre> <p>You'll then see the html returned of your test PDF</p> <p>From python, you just want a simple socket call much as netcat there is doing, send over the binary data, then read back your result. For example, try something like:</p> <pre><code>#!/usr/bin/python import socket, sys # Where to connect host = '127.0.0.1' port = 1234 if len(sys.argv) &lt; 2: print "Must give filename" sys.exit(1) filename = sys.argv[1] print "Sending %s to Tika on port %d" % (filename, port) # Connect to Tika s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((host,port)) # Open the file to send f = open(filename, 'rb') # Stream the file to Tika while True: chunk = f.read(65536) if not chunk: # EOF break s.sendall(chunk) # Tell Tika we have sent everything s.shutdown(socket.SHUT_WR) # Get the response while True: chunk = s.recv(65536) if not chunk: # EOF break print chunk </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload