Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy is my image distorted when decoding as FlateDecode using iTextSharp?
    primarykey
    data
    text
    <p>When decoding an image within a PDF as <code>FlateDecode</code> via iTextSharp the image is distorted and I can't seem to figure out why. </p> <p>The recognized bpp is <code>Format1bppIndexed</code>. If I modify the <code>PixelFormat</code> to <code>Format4bppIndexed</code> the image is recognizable to some degree (shrunk, coloring is off but readable) and is duplicated 4 times in a horizontal manner. If I adjust the pixel format to <code>Format8bppIndexed</code> it is also recognizable to some degree and is duplicated 8 times in a horizontal manner. </p> <p>The image below is after a <code>Format1bppIndexed</code> pixel format approach. Unfortunately I am unable to show the others due to security constraints.</p> <p><img src="https://i.stack.imgur.com/xtUss.png" alt="distorted image"></p> <p>The code is seen below which is essentially the single solution I have come across littered around both SO and the web.</p> <pre><code>int xrefIdx = ((PRIndirectReference)obj).Number; PdfObject pdfObj = doc.GetPdfObject(xrefIdx); PdfStream str = (PdfStream)(pdfObj); byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)str); string filter = ((PdfArray)tg.Get(PdfName.FILTER))[0].ToString(); string width = tg.Get(PdfName.WIDTH).ToString(); string height = tg.Get(PdfName.HEIGHT).ToString(); string bpp = tg.Get(PdfName.BITSPERCOMPONENT).ToString(); if (filter == "/FlateDecode") { bytes = PdfReader.FlateDecode(bytes, true); System.Drawing.Imaging.PixelFormat pixelFormat; switch (int.Parse(bpp)) { case 1: pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed; break; case 8: pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed; break; case 24: pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb; break; default: throw new Exception("Unknown pixel format " + bpp); } var bmp = new System.Drawing.Bitmap(Int32.Parse(width), Int32.Parse(height), pixelFormat); System.Drawing.Imaging.BitmapData bmd = bmp.LockBits(new System.Drawing.Rectangle(0, 0, Int32.Parse(width), Int32.Parse(height)), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat); Marshal.Copy(bytes, 0, bmd.Scan0, bytes.Length); bmp.UnlockBits(bmd); bmp.Save(@"C:\temp\my_flate_picture-" + DateTime.Now.Ticks.ToString() + ".png", ImageFormat.Png); } </code></pre> <p>What do I need to do to so that my image extraction works as desired when dealing with <code>FlateDecode</code>?</p> <p><strong>NOTE</strong>: I do not want to use another library to extract the images. I am looking for a solution leveraging <em>ONLY</em> iTextSharp and the .NET FW. If a solution exists via Java (iText) and is easily portable to .NET FW bits that would suffice as well.</p> <p><strong>UPDATE</strong>: The <code>ImageMask</code> property is set to true, which would imply that there is no color space and is therefore implicitly black and white. With the bpp coming in at 1, the <code>PixelFormat</code> should be <code>Format1bppIndexed</code> which as mentioned earlier, produces the embedded image seen above.</p> <p><strong>UPDATE</strong>: To get the image size I extracted it out using Acrobat X Pro and the image size for this particular example was listed as 2403x3005. When extracting via iTextSharp the size was listed as 2544x3300. I modified the image size within the debugger to mirror 2403x3005 however upon calling <code>Marshal.Copy(bytes, 0, bmd.Scan0, bytes.Length);</code> I get an exception raised.</p> <blockquote> <p>Attempted to read or write protected memory. This is often an indication that other memory is corrupt.</p> </blockquote> <p>My assumption is that this is due to the modification of the size and thus no longer corresponding to the byte data that is being used.</p> <p><strong>UPDATE</strong>: Per Jimmy's recommendation, I verified that calling <code>PdfReader.GetStreamBytes</code> returns a byte[] length equal to width<em>height/8 since <code>GetStreamBytes</code> should be calling <code>FlateDecode</code>. Manually calling <code>FlateDecode</code> and calling <code>PdfReader.GetStreamBytes</code> both produced a byte[] length of 1049401, while the width</em>height/8 is 2544*3300/8 or 1049400, so there is a difference of 1. Not sure if this would be the root cause or not, an off by one; however I am not sure how to resolve if that is indeed the case.</p> <p><strong>UPDATE</strong>: In trying the approach mentioned by kuujinbo I am met with an <code>IndexOutOfRangeException</code> when I attempt to call <code>renderInfo.GetImage();</code> within the <code>RenderImage</code> listener. The fact that the width*height/8 as stated earlier is off by 1 in comparison to the byte[] length when calling <code>FlateDecode</code> makes me think these are all one in the same; however a solution still eludes me.</p> <pre><code> at System.util.zlib.Adler32.adler32(Int64 adler, Byte[] buf, Int32 index, Int32 len) at System.util.zlib.ZStream.read_buf(Byte[] buf, Int32 start, Int32 size) at System.util.zlib.Deflate.fill_window() at System.util.zlib.Deflate.deflate_slow(Int32 flush) at System.util.zlib.Deflate.deflate(ZStream strm, Int32 flush) at System.util.zlib.ZStream.deflate(Int32 flush) at System.util.zlib.ZDeflaterOutputStream.Write(Byte[] b, Int32 off, Int32 len) at iTextSharp.text.pdf.codec.PngWriter.WriteData(Byte[] data, Int32 stride) at iTextSharp.text.pdf.parser.PdfImageObject.DecodeImageBytes() at iTextSharp.text.pdf.parser.PdfImageObject..ctor(PdfDictionary dictionary, Byte[] samples) at iTextSharp.text.pdf.parser.PdfImageObject..ctor(PRStream stream) at iTextSharp.text.pdf.parser.ImageRenderInfo.PrepareImageObject() at iTextSharp.text.pdf.parser.ImageRenderInfo.GetImage() at cyos.infrastructure.Core.MyImageRenderListener.RenderImage(ImageRenderInfo renderInfo) </code></pre> <p><strong>UPDATE</strong>: Trying varying the varying methods listed here in my original solution as well as the solution posed by kuujinbo with a different page in the PDF produces imagery; however the issues always surface when the the filter type is <code>/FlateDecode</code> and no image is produced for that given instance.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload