Note that there are some explanatory texts on larger screens.

plurals
  1. POViola-Jones' face detection claims 180k features
    primarykey
    data
    text
    <p>I've been implementing an adaptation of <a href="http://scholar.google.com/scholar?cluster=6119571473300502765" rel="noreferrer">Viola-Jones' face detection algorithm</a>. The technique relies upon placing a subframe of 24x24 pixels within an image, and subsequently placing rectangular features inside it in every position with every size possible.</p> <p>These features can consist of two, three or four rectangles. The following example is presented.</p> <p><img src="https://i.stack.imgur.com/5MKl7.png" alt="Rectangle features"></p> <p>They claim the exhaustive set is more than 180k (section 2):</p> <blockquote> <p>Given that the base resolution of the detector is 24x24, the exhaustive set of rectangle features is quite large, over 180,000 . Note that unlike the Haar basis, the set of rectangle features is overcomplete.</p> </blockquote> <p>The following statements are not explicitly stated in the paper, so they are assumptions on my part:</p> <ol> <li>There are only 2 two-rectangle features, 2 three-rectangle features and 1 four-rectangle feature. The logic behind this is that we are observing the <em>difference</em> between the highlighted rectangles, not explicitly the color or luminance or anything of that sort.</li> <li>We cannot define feature type A as a 1x1 pixel block; it must at least be at least 1x2 pixels. Also, type D must be at least 2x2 pixels, and this rule holds accordingly to the other features.</li> <li>We cannot define feature type A as a 1x3 pixel block as the middle pixel cannot be partitioned, and subtracting it from itself is identical to a 1x2 pixel block; this feature type is only defined for even widths. Also, the width of feature type C must be divisible by 3, and this rule holds accordingly to the other features.</li> <li>We cannot define a feature with a width and/or height of 0. Therefore, we iterate <em>x</em> and <em>y</em> to 24 minus the size of the feature.</li> </ol> <p>Based upon these assumptions, I've counted the exhaustive set:</p> <pre><code>const int frameSize = 24; const int features = 5; // All five feature types: const int feature[features][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}}; int count = 0; // Each feature: for (int i = 0; i &lt; features; i++) { int sizeX = feature[i][0]; int sizeY = feature[i][1]; // Each position: for (int x = 0; x &lt;= frameSize-sizeX; x++) { for (int y = 0; y &lt;= frameSize-sizeY; y++) { // Each size fitting within the frameSize: for (int width = sizeX; width &lt;= frameSize-x; width+=sizeX) { for (int height = sizeY; height &lt;= frameSize-y; height+=sizeY) { count++; } } } } } </code></pre> <p>The result is <strong>162,336</strong>.</p> <p>The only way I found to approximate the "over 180,000" Viola &amp; Jones speak of, is dropping assumption #4 and by introducing bugs in the code. This involves changing four lines respectively to:</p> <pre><code>for (int width = 0; width &lt; frameSize-x; width+=sizeX) for (int height = 0; height &lt; frameSize-y; height+=sizeY) </code></pre> <p>The result is then <strong>180,625</strong>. (Note that this will effectively prevent the features from ever touching the right and/or bottom of the subframe.)</p> <p>Now of course the question: have they made a mistake in their implementation? Does it make any sense to consider features with a surface of zero? Or am I seeing it the wrong way?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload