StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<h2>Practical issues</h2> <p>Since you need a scale-invariant method (that's the proper jargon for "could be of various sizes") SIFT (as mentioned in <a href="https://stackoverflow.com/questions/2074956/">Logo recognition in images</a>, thanks overrider!) is a good first choice, it's very popular these days and is worth a try. You can find <a href="http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT" rel="nofollow noreferrer">here</a> some code to download. If you cannot use Matlab, you should probably go with OpenCV. Even if you end up discarding SIFT for some reason, trying to make it work will teach you a few important things about object recognition.</p> <h2>General description and lingo</h2> <p>This section is mostly here to introduce you to a few important buzzwords, by describing a broad class of object detection methods, so that you can go and look these things up. Important: there are many other methods that do not fall in this class. We'll call this class "feature-based detection".</p> <p>So first you go and find <em>features</em> in your image. These are characteristic points of the image (corners and line crossings are good examples) that have a lot of <em>invariances</em>: whatever reasonable processing you do to to your image (scaling, rotation, brightness change, adding a bit of noise, etc) it will not change the fact that there is a corner in a certain point. "Pixel value" or "vertical lines" are bad features. Sometimes a feature will include some numbers (e.g. the prominence of a corner) in addition to a position.</p> <p>Then you do some clean-up, like remove features that are not strong enough.</p> <p>Then you go to your <em>database</em>. That's something you've built in advance, usually by taking several nice and clean images of whatever you are trying to find, running you feature detection on them, cleaning things up, and arrange them in some data structure for your next stage —</p> <p><em>Look-up</em>. You have to take a bunch of features form your image and try to match them against your database: do they correspond to an object you are looking for? This is pretty non-trivial, since on the face of it you have to consider all subsets of the bunch of features you've found, which is exponential. So there are all kinds of smart hashing techniques to do it, like <em>Hough transform</em> and <em>Geometric hashing</em>.</p> <p>Now you should do some verification. You have found some places in the image which are suspect: it's probable that they contain your object. Usually, you know what is the presumed size, orientation, and position of your object, and you can use something simple (like a <em>convolution</em>) to check if it's really there.</p> <p>You end up with a bunch of probabilities, basically: for a few locations, how probable it is that your object is there. Here you do some <em>outlier detection</em>. If you expect only 1-2 occurrences of your object, you'll look for the largest probabilities that stand out, and take only these points. If you expect many occurrences (like face detection on a photo of a bunch of people), you'll look for very low probabilities and discard them.</p> <p>That's it, you are done!</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload