Note that there are some explanatory texts on larger screens.

plurals
  1. USiliasfl
    primarykey
    data
    text
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. COI understand you don't have any background in machine learning. Unfortunately you aren't overlooking something. The problem you describe is hard. Unless you have some serious time to spare on this, that is learn about text analysis and machine learning, I believe the easiest way for you to go is to use a list of words (manually compiled or retrieved from the net) for each topic you are interested in detecting. Then use a simple voting scheme to predict the "correct" topic based on the frequencies of the words.
      singulars
    2. COAUC is not designed "specifically" for imbalanced datasets. It is about postponing the decision about Precision/Recall tradeoff (until some domain expert tell you what's the cost between false positives vs. false negatives). If you know the required levels of Precision/Recall you don't need AUC for model selection. Having imbalanced dataset just requires monitoring two quantities instead of one precision/recall, sensitivity/specificity etc. Summarising to one qunatity like AUC or F-score can easily mislead you. The problem in question is totally different.
      singulars
    3. COAn easy to read book for SVMs is by N. Cristianini and J. S.Taylor "An Introduction to Support Vector Machines and Other Kernel-based Learning Methods". For text read Joachim's papers. Linear kernels are just K(X,y)= <phi(x),phi(y)>= x*y, ie. the mapping function is the identity function Phi(x)=x.
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload