StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POImage Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition
primarykey
Id
10168686
data
AcceptedAnswerId
10169025
AnswerCount
23
ClosedDate
CommentCount
26
CommunityOwnedDate
CreationDate
2012-04-16T04:23:16.380
FavoriteCount
950
LastActivityDate
2018-07-22T12:19:59.727
LastEditDate
2016-03-16T17:39:27.940
LastEditorUserId
4684058
OwnerUserId
1332690
ParentId
0
PostTypeId
1
Score
1404
ViewCount
145830
LastEditorDisplayName
text
Body
One of the most interesting projects I've worked on in the past couple of years was a project about <a href="https://en.wikipedia.org/wiki/Image_processing" rel="noreferrer">image processing</a>. The goal was to develop a system to be able to recognize Coca-Cola 'cans' (note that I'm stressing the word 'cans', you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation. <img src="https://i.stack.imgur.com/irQtR.png" alt="Template matching"> Some constraints on the project: <ul> <li>The background could be very noisy.</li> <li>The can could have any scale or rotation or even orientation (within reasonable limits).</li> <li>The image could have some degree of fuzziness (contours might not be entirely straight).</li> <li>There could be Coca-Cola bottles in the image, and the algorithm should only detect the can!</li> <li>The brightness of the image could vary a lot (so you can't rely "too much" on color detection).</li> <li>The can could be partly hidden on the sides or the middle and possibly partly hidden behind a bottle.</li> <li>There could be no can at all in the image, in which case you had to find nothing and write a message saying so.</li> </ul> So you could end up with tricky things like this (which in this case had my algorithm totally fail): <img src="https://i.stack.imgur.com/Byw82.png" alt="Total fail"> I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation: Language: Done in C++ using <a href="http://opencv.org" rel="noreferrer">OpenCV</a> library. Pre-processing: For the image pre-processing, i.e. transforming the image into a more raw form to give to the algorithm, I used 2 methods: <ol> <li>Changing color domain from RGB to <a href="http://en.wikipedia.org/wiki/HSL_and_HSV" rel="noreferrer">HSV</a> and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with. <img src="https://i.stack.imgur.com/ktdAB.png" alt="Binarized image"> </li> <li>Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise.</li> <li>Using <a href="http://en.wikipedia.org/wiki/Canny_edge_detector" rel="noreferrer">Canny Edge Detection Filter</a> to get the contours of all items after 2 precedent steps. <img src="https://i.stack.imgur.com/F9319.png" alt="Contour detection"></li> </ol> Algorithm: The algorithm itself I chose for this task was taken from <a href="http://rads.stackoverflow.com/amzn/click/0123725380" rel="noreferrer">this</a> awesome book on feature extraction and called <a href="http://en.wikipedia.org/wiki/Generalised_Hough_transform" rel="noreferrer">Generalized Hough Transform</a> (pretty different from the regular Hough Transform). It basically says a few things: <ul> <li>You can describe an object in space without knowing its analytical equation (which is the case here).</li> <li>It is resistant to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor.</li> <li>It uses a base model (a template) that the algorithm will "learn".</li> <li>Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model.</li> </ul> In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below: <img src="https://i.stack.imgur.com/wxrT1.png" alt="GHT"> Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). In theory at least... Results: Now, while this approach worked in the basic cases, it was severely lacking in some areas: <ul> <li>It is extremely slow! I'm not stressing this enough. Almost a full day was needed to process the 30 test images, obviously because I had a very high scaling factor for rotation and translation, since some of the cans were very small.</li> <li>It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes)</li> <li>Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map.</li> <li>In-variance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized.</li> </ul> Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentioned? I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn. :)
Tags
<c++><algorithm><image-processing><opencv>
Title
Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USCharles Menguy
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
3. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POImage Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POImage Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition
 UserUserId
 USbtown
 VoteTypeVoteTypeId
 VTFavorite
3. VO
 singulars
 PostPostId
 POImage Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition
 UserUserId
 USzengr
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.