StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
9558395
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
7
CommunityOwnedDate
CreationDate
2012-03-04T20:10:55.880
FavoriteCount
0
LastActivityDate
2012-03-04T20:10:55.880
LastEditDate
LastEditorUserId
0
OwnerUserId
246383
ParentId
9408679
PostTypeId
2
Score
0
ViewCount
0
LastEditorDisplayName
text
Body
I do not know the library you use to access the inner structures of a PDF file but the problem at hand will have tree distinct subproblems: <ol> <li>Find all images in the PDF file</li> <li>Decode the images to their components</li> <li>Convert the decoded image to a DIB</li> </ol> Find all Images Images can occur inside content streams or in streams attached to dictionaries. To find all images in content streams, you need to find all content streams in either Pages, XObjects or Patterns. Each of those can have a Resources -> XObject dictionary that references all XObjects (and an XObject can be an Image). If you avoid the inline images you might simply scan the PDF file and each dectionary that is of type XObject subtype Image can be decoded. Decode All streams (inline in content streams) of in separate objects in the PDF file are encoded and mught need post processing using the Decode arrays. There are several filters that you need to be able to perform for decoding. Flate decode (ZLIB), JPEG and CCITT (fax G3/G4) are probable the most used for images. Hopefully the PDF library you use will know how to decode the streams.. Next there are Decode arrays (a bit rare) where each color component can be scaled from an input value to an output value. This is a linear transformation. To DIB Next in line is the conversion of the decoded image to a DIB. This means you need to convert the color components to something Windows can 'get' (eg, Palette, grayscale (special palette) of RGB. PDF supports a very very large variety of color spaces and converting them to RGB is no sinecure. You best hope here is that the PDFs you need to process only use a select subset (like RGB and palette). Now a DIB can be simply created by creating the bitmap header (BITMAPINFO), fill in all data and call the DIB creation function CreateDIBSection and them process the DIB the way you application needs. Epilogue All in all: to be able to process all PDF files and find all images is quite a daunting task, if you control the source if teh PDFs and you know they are always in DeviceRGB format and always JPEG etc and never inlined into the content stream it is do-able.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POConvert image data (PDEImage) in PDF to DIB using C++?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USRitsaert Hornstra
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.