Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Here is what I have found out so far:</p> <p>PDFBox uses a resource file to bound PDF operators/instructions to certain classes which then process the information.</p> <p>If we take a look at the <strong><em>PDFTextStripper.properties</em></strong> resource file under: </p> <blockquote> <p>pdfbox\src\main\resources\org\apache\pdfbox\resources\ </p> </blockquote> <p>we can see that for instance the BT operator is bound to the <strong>org.apache.pdfbox.util.operator.BeginText</strong> class and so on.</p> <p>The <strong>PDFTextStripper</strong> under </p> <blockquote> <p>pdfbox\src\main\java\org\apache\pdfbox\util\ </p> </blockquote> <p>takes this into account and utilizes the processing of the PDF with this classes. </p> <p><strong>BUT all graphical objects are ignored, therefore no information of underline or table structure!</strong></p> <p>Now if we take a look at the <em>PageDrawer.properties</em> resource file we can see that this one bounds to almost all operators available. Which is utilized by <strong>PageDrawer</strong> class under </p> <blockquote> <p>pdfbox\src\main\java\org\apache\pdfbox\pdfviewer\</p> </blockquote> <p>The "trick" is now to find out which graphical operators are those who represent underline and tables and to use them in combination with <strong>PDFTextStripper</strong>.</p> <p>Now this would mean reading the PDF file specification, which is currently way to much work.</p> <p>If someone knows which operators are responsible for which actions to draw underlines and table lines please let me know.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload