Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I have a java application where I ended up deciding to use <a href="http://code.google.com/p/tesseract-ocr/" rel="nofollow">Tesseract OCR</a>, and just call out to it using <code>Runtime.exec()</code>. Perhaps not quite the answer you need, but just in case you'd not considered it.</p> <hr> <h3>Edit + code added in response to comment reply</h3> <ul> <li>On a Windows installation I think I was able to use an installer, or unzip a ready made binary.</li> <li><p>On a Linux server, I needed to compile Tesseract myself, but it's not too hard if you're used to that kind of thing (gcc); the only gotcha is that there's a dependency on <a href="http://www.leptonica.com/" rel="nofollow">Leptonica</a> which also needs to be compiled.</p> <pre><code>// Tesseract can only handle .tif format, so we have to convert it ImageIO.write( ImageIO.read( new java.io.File(file.getPath())), "tif", tmpFile[0]); String[] tesseractCmd = new String[]{"tesseract", tmpFile[0].getAbsolutePath(), StringUtils.removeEnd(tmpFile[1].getAbsolutePath(), ".txt")}; final Process process = Runtime.getRuntime().exec(tesseractCmd); try { int exitValue = process.waitFor(); if(exitValue == 0) { final String extractedText = SearchableTextExtractionUtils.extractPlainText(new FileReader(tmpFile[1])); return extractedText; } throw new SearchableTextExtractionException(exitValue, Arrays.toString(tesseractCmd)); } catch (InterruptedException e) { throw new SearchableTextExtractionException(e); } finally { process.destroy(); } </code></pre></li> </ul>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload