Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>There are several ways to get classification results that take into account multiple features. What you have suggested is one possibility where instead of combining features you train multiple classifiers and through some protocol, arrive at a consensus between them. This is typically under the field of <strong>ensemble methods</strong>. Try googling for boosting, random forests for more details on how to combine the classifiers.</p> <p>However, it is not true that your feature vectors cannot be concatenated because they have different dimensions. You can still concatenate the features together into a huge vector. E.g., joining your SIFT and HIST features together will give you a vector of 384 dimensions. Depending on the classifier you use, you will likely have to normalize the entries of the vector so that no one feature dominate simply because by construction it has larger values.</p> <p><strong>EDIT</strong> in response to your comment: It appears that your histogram is some feature vector describing a characteristic of the entire object (e.g. color) whereas your SIFT descriptors are extracted at local interest keypoints of that object. Since the number of SIFT descriptors may vary from image to image, you cannot pass them directly to a typical classifier as they often take in one feature vector per sample you wish to classify. In such cases, you will have to build a <strong>codebook</strong> (also called visual dictionary) using the SIFT descriptors you have extracted from many images. You will then use this codebook to help you derive a SINGLE feature vector from the many SIFT descriptors you extract from each image. This is what is known as a "<strong>bag of visual words (BOW)</strong>" model. Now that you have a single vector that "summarizes" the SIFT descriptors, you can concatenate that with your histogram to form a bigger vector. This single vector now summarizes the ENTIRE image/(object in the image). </p> <p>For details on how to build the bag of words codebook and how to use it to derive a single feature vector from the many SIFT descriptors extracted from each image, look at this book (free for download from author's website) <a href="http://programmingcomputervision.com/" rel="noreferrer">http://programmingcomputervision.com/</a> under the chapter "Searching Images". It is actually a lot simpler than it sounds. </p> <p>Roughly, just run KMeans to cluster the SIFT descriptors from many images and take their centroids (which is a vector called a "visual word") as the codebook. E.g. for K = 1000 you have a 1000 visual word codebook. Then, for each image, create a result vector the same size as K (in this case 1000). Each element of this vector corresponds to a visual word. Then, for each SIFT descriptor extracted from an image, find its closest matching vector in the codebook and increment the count in the corresponding cell in the result vector. When you are done, this result vector essentially counts how often the different visual words appear in the image. Similar images will have similar counts for the same visual words and hence this vector effectively represents your images. You will also need to "normalize" this vector to make sure that images with different number of SIFT descriptors (and hence total counts) are comparable. This can be as simple as simply dividing each entry by the total count in the vector or through a more sophisticated measure such as tf/idf as described in the book. </p> <p>I believe the author also provide python code on his website to accompany the book. Take a look or experiment with them if you are unsure.</p> <p>More sophisticated method for combining features include Multiple Kernel Learning (MKL). In this case, you compute different kernel matrices, each using one feature. You then find the optimal weights to combine the kernel matrices and use the combined kernel matrix to train a SVM. You can find the code for this in the Shogun Machine Learning Library.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. COThanks for great answer to my question. For the method you mentioned MKL it looks interesting as i want not aware of this. For the second method you mentioned where we combined. I am using OpenCV and for example one image using sift has [128 x 34] feature vector size and Histogram has [256 x 1]. I tried to combine them but I couldn't. That is why i felt that it cannot be combined. Did i do something wrong? Once again thank you so much.
      singulars
    2. COI added details in my answer response to your reply. Btw, even with MKL, you would likely need to reduce your many SIFT descriptors to a single vector to use that. Part of the reason is that different images will give your different number of SIFT descriptors. Unless you are matching the SIFT descriptors directly, a classifier cannot typically handle different number of feature vectors for each sample (i.e. image).
      singulars
    3. CO@ lightalchemist , Thanks alot that was sweet description and took me just few hours to learn and implement it. Thank you so much. I like the idea of MKL but since i have no idea about it, I will try out after i learn the idea and try to see how it goes. I am using SVM for classification after creating the BOW model, it seems to work perfect. You stated about Boosting and Random Forest but I kind of like the libSVM so I used that. Is ok to use SVM with BOW? I think its good but just classifying. Sorry to ask so much question but you have very good explanation that i cant help asking..:P
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload