Note that there are some explanatory texts on larger screens.

plurals
  1. PORecognizing similar shapes at random scale and translation
    primarykey
    data
    text
    <p>Playing around with finding stuff on a graphical screen, I'm currently at a loss about how to find a given shape within an image. The shape in the image could have a different scale and will be at some unknown x,y offset, of course.</p> <p>Aside from pixel artifacts resulting from different scales, there is also a little noise in both images, so I need a somewhat tolerant search.</p> <p>Here's the image I am looking for.</p> <p><img src="https://i.stack.imgur.com/jrIB2.png" alt="Farmerama frame"></p> <p>It should show up somewhere in a screen dump of my (dual) screen buffer, roughly 3300 x 1200 pixels in size. I'd of course expect to find it in a browser window, but that information shouldn't be necessary.</p> <p>The object of this exercise (so far) is to come up with a result that says:</p> <ul> <li>Yes, the wooden frame (of that approximate color and that, possibly slightly truncated, shape) was found on my screen (or not); and</li> <li>the game's client area (the black area inside the frame) occupies the rectangle from <code>(x1,y1)</code> to <code>(x2,y2)</code>.</li> </ul> <p>I would like to be robust against scaling and the noise that's likely to be introduced by dithering. On the other hand, I can rule out some of the usual CV challenges, such as rotation or non-rigidity. That frame shape is dead easy for the human brain to discern, how hard can it be for a dedicated piece of software? This is an Adobe Flash application, and until recently I had thought that perceiving the images from a game GUI should be easy as pie.</p> <p>I'm looking for an algorithm that is able to find the x,y translation at which the greatest possible overlap between the needle and haystack occur, and if possible without having to be iterated through a series of possible scale factors. Ideally, the algorithm could abstract out the "shape-ness" of the images in a way that's independent of scale.</p> <p>I've read some interesting things about Fourier Transforms to accomplish something similar: Given a target image at the same scale, FFT and some matrix math yielded up the points in the bigger image that corresponded to the search pattern. But I don't have the theoretical background to put this into practice, nor do I know if this approach will gracefully handle the scale problem. Help would be appreciated!</p> <p>Technology: I'm programming in Clojure/Java but could adapt algorithms in other languages. I think I should be able to interface with libraries that follow C calling conventions but I would prefer a pure Java solution.</p> <hr> <p>You may be able to understand why I've shied away from presenting the actual image. It's just a silly game, but the task of screen-reading it is proving much more challenging than I had thought.</p> <p>I'm obviously able to do an exhaustive search of my screen buffer for the very pixels (excluding the black) that make up my image, and that even runs in under a minute. But my ambition was to find that wooden frame using a technique that would match the shape regardless of differences that might arise from scaling and dithering.</p> <p>Dithering, in fact, is one of many frustrations I'm having with this project. I've been working on extracting some useful vectors by edge extraction, but edges are woefully elusive because the pixels of any given area have widely inconsistent colors - so it's hard to tell real edges from local dithering artifacts. I had no idea that such a simple-looking game would produce graphics that are so hard for software to perceive.</p> <p>Should I start off by locally averaging pixels before I start looking for features? Should I reduce color depth by throwing out the least significant bits of the pixel color values?</p> <p>I'm <em>trying</em> for a pure Java solution (actually programming in Clojure/Java mix) so I'm not wild about opencv (which installs .DLL's or .so's with C code). Please don't worry about my choice of language, the learning experience is much more interesting to me than performance.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. COIt is not clear to what kind of frequency domain usage you are mentioning. My best guess, given the problem, is comparison by Fourier descriptors. These can be easily made rotation, translation, and scale invariant, thus helpful for your problem. You begin by extracting each contour of the connected components in your binary image, then sample each one and determine the Fourier descriptors. The same is done for the "needle" image. Then you can try matching shapes using these descriptors. But, there are many other methods for this task, depending on other hidden (forgotten) requirements.
      singulars
    2. COAlso check out SIFT and SURF if these algorithms aren't familiar to you; Gary Bradski's book Learning OpenCV can provide some guidance. Several commercial vision libraries ($$) have implementations of "robust shape matching" that simplify setup. http://en.wikipedia.org/wiki/SURF
      singulars
    3. COCarl, could you post some of the original sample images (and/or a link to an archive of sample images)? Are you looking for a robust solution, an easy solution, a fun/complex solution just to test, or the "optimal" solution (for some problem domain/market)? There are statistical descriptors, Fourier descriptors, etc., but there are also techniques that may be a little easier to get your head around, and may work well enough for your purpose. (I also retagged your question to add "opencv" and "image-processing" so that it gets a bit more attention.)
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload