Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>So I basically did something very similar to this a few years ago. <strong>The algorithm you want to look into was proposed a few years ago by David Nister, the paper is: "Scalable Recognition with a Vocabulary Tree". They pretty much have an exact solution to your problem that can scale to millions of images.</strong></p> <p>Here is a link to the abstract, you can find a download link by googleing the title. <a href="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1641018" rel="noreferrer">http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1641018</a></p> <p>The basic idea is to build a tree with a hierarchical k-means algorithm to model the features and then leverage the sparse distribution of features in that tree to quickly find your nearest neighbors... or something like that, it's been a few years since I worked on it. You can find a powerpoint presentation on the authors webpage here: <a href="http://www.vis.uky.edu/~dnister/Publications/publications.html" rel="noreferrer">http://www.vis.uky.edu/~dnister/Publications/publications.html</a></p> <p>A few other notes:</p> <ul> <li><p>I wouldn't bother with the pyramid match kernel, it's really more for improving object recognition than duplicate/transformed image detection.</p></li> <li><p>I would not store any of this feature stuff in an SQL database. Depending on your application it is <em>sometimes</em> more effective to compute your features on the fly since their size can exceed the original image size when computed densely. Histograms of features or pointers to nodes in a vocabulary tree are much more efficient.</p></li> <li><p>SQL databases are not designed for doing massive floating point vector calculations. <strong>You can store things in your database, but don't use it as a tool for computation.</strong> I tried this once with SQLite and it ended very badly.</p></li> <li><p>If you decide to implement this, read the paper in detail and keep a copy handy while implementing it, as there are many minor details that are very important to making the algorithm work efficiently.</p></li> </ul>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload