Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Do not underestimate the scale of this task. The text matrix bit is pretty simple and straightforward. The difficult bit is the text itself.</p> <p>Let's start with your query - why does each group of four have a leading 00?</p> <p>Well PDF doesn't have a standard text encoding - it has lots and lots and lots. You need to know what the encoding is for the font before you can decode the text.</p> <p>So in your example:</p> <pre><code>BT /F1 8.88 Tf 0 0 0 rg 0.9998 0 0 1 401.52 448.08 Tm [&lt;0014&gt;-11&lt;0015&gt;-11&lt;0013&gt;-11&lt;000F&gt;-19&lt;0014&gt;-11&lt;0019&gt;] TJ ET </code></pre> <p>The font is the /F1 bit. This is a name that exists in the Page (or parents of) that relates to a font. You need to look up the font and find out what the encoding is.</p> <p>Given the content in your example I suspect that the encoding is an identity one and that the four digit hex numbers are glyph IDs within the font. If this is the case then the font should have a ToUnicode entry which will allow you to look up the glyph ID and get back a Unicode character.</p> <p>Other fonts may or may not have ToUnicode entries and if this occurs there are a variety of ways you can extract the Unicode text. Different methods may give different results which is why the PDF spec has an entire section entitled "Extraction of Text Content" detailing the order in which these should be attempted.</p> <p>Hopefully your PoDoFo library should have methods to do this kind of conversion. If not the task will be quite hard and I think you should consider some other options. I wrote the text extraction code for our ABCpdf .NET library and it took some months to code followed by some years of tweaking.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload