StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POGet the number of pages in a PDF document
text
Body
copied!<h3><em>This question is for referencing and comparing. The solution is <a href="https://stackoverflow.com/a/14644354/1062281">the accepted answer below</a>.</em></h3> <p>Many hours have I searched for a fast and easy, but mostly <em>accurate</em>, way to get the number of pages in a PDF document. Since I work for a graphic printing and reproduction company that works a lot with PDFs, the number of pages in a document must be precisely known before they are processed. PDF documents come from many different clients, so they aren't generated with the same application and/or don't use the same compression method.</p> <p>Here are some of the answers I found <strong><em>insufficient</em></strong> or simply <strong><em>NOT working</em></strong>:</p> <h2>Using <a href="http://php.net/manual/en/book.imagick.php" rel="noreferrer">Imagick</a> (a PHP extension)</h2> <p>Imagick requires a lot of installation, apache needs to restart, and when I finally had it working, it took amazingly long to process (2-3 minutes per document) and it always returned <code>1</code> page in every document (haven't seen a working copy of Imagick so far), so I threw it away. That was with both the <code>getNumberImages()</code> and <code>identifyImage()</code> methods.</p> <h2>Using <a href="http://www.setasign.de/products/pdf-php-solutions/fpdi/about/" rel="noreferrer">FPDI</a> (a PHP library)</h2> <p>FPDI is easy to use and install (just extract files and call a PHP script), <strong>BUT</strong> many of the compression techniques are not supported by FPDI. It then returns an error:</p> <blockquote> <p>FPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI.</p> </blockquote> <h2>Opening a stream and search with a regular expression:</h2> <p>This opens the PDF file in a stream and searches for some kind of string, containing the pagecount or something similar.</p> <pre><code>$f = "test1.pdf"; $stream = fopen($f, "r"); $content = fread ($stream, filesize($f)); if(!$stream || !$content) return 0; $count = 0; // Regular Expressions found by Googling (all linked to SO answers): $regex = "/\/Count\s+(\d+)/"; $regex2 = "/\/Page\W*(\d+)/"; $regex3 = "/\/N\s+(\d+)/"; if(preg_match_all($regex, $content, $matches)) $count = max($matches); return $count; </code></pre> <ul> <li><code>/\/Count\s+(\d+)/</code> (looks for <code>/Count <number></code>) doesn't work because only a few documents have the parameter <code>/Count</code> inside, so most of the time it doesn't return anything. <a href="https://stackoverflow.com/a/2314086/1062281">Source.</a></li> <li><code>/\/Page\W*(\d+)/</code> (looks for <code>/Page<number></code>) doesn't get the number of pages, mostly contains some other data. <a href="https://stackoverflow.com/a/1536494/1062281">Source.</a></li> <li><code>/\/N\s+(\d+)/</code> (looks for <code>/N <number></code>) doesn't work either, as the documents can contain multiple values of <code>/N</code>; most, if not all, <strong>not</strong> containing the pagecount. <a href="https://stackoverflow.com/a/7462664/1062281">Source.</a></li> </ul> <hr> <blockquote> <h3>So, what does work reliable and accurate?</h3> <p><a href="https://stackoverflow.com/a/14644354/1062281">See the answer below</a></p> </blockquote>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload