Note that there are some explanatory texts on larger screens.

plurals
  1. POitext java pdf to text creation
    primarykey
    data
    text
    <p>I use a itext for converting pdf to text file, it works good actually but for some words it do the following thing: for example in pdf there is phrase like "present the main ideas" but itext creates an output like "presentthemainideas". Is there anyway to correct this behaviour?</p> <pre><code> String pdf="/home/can/Downloads/NLP/textSummarization/A New Approach for Multi-Document Update Summarization.pdf"; String txt="/home/can/myWorkSpace/PDFConverterProject/outputs/bb.txt"; StringBuffer text=new StringBuffer() ; String resultText=""; PdfReader reader; try { reader = new PdfReader(pdf); PdfReaderContentParser parser = new PdfReaderContentParser(reader); PrintWriter out = new PrintWriter(new FileOutputStream(txt)); TextExtractionStrategy strategy; for (int i = 1; i &lt;= reader.getNumberOfPages(); i++) { strategy = parser.processContent(i, new SimpleTextExtractionStrategy()); text.append(strategy.getResultantText()); } resultText=text.toString(); resultText = resultText.replaceAll("-\n", ""); out.println("--&gt;"+resultText); StringTokenizer stringTokenizer=new StringTokenizer(resultText, "\n"); PrintWriter lineWriter = new PrintWriter(new FileOutputStream("/home/can/myWorkSpace/PDFConverterProject/outputs/line.txt")); while (stringTokenizer.hasMoreTokens()){ String curToken = stringTokenizer.nextToken(); lineWriter.println("line--&gt;"+curToken); } lineWriter.flush(); lineWriter.close(); out.flush(); out.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload