Note that there are some explanatory texts on larger screens.

plurals
  1. POFilling data.frames correctly in R
    primarykey
    data
    text
    <p>I am practicing with <code>SVM-light</code> on the <code>Iris dataset</code> in <code>R</code>, and I am trying to set up a <code>data.frame</code> to stay organized, rather than do everything with separate arrays. </p> <p>What I want to do is test to see the different values of regularization parameter on the error rate of the SVM, by training it on different sets of 50 rows and 100 regularization values, and testing it on the remaining data. Since I have to call the SVM interface with a scalar value anyway, I figure it's okay to use a for loop to change the regularization parameter each time.</p> <p>My problem is trying to keep the <code>data.frame</code> organized. Do I set the <code>row.names</code> to <code>1:150</code> (150 irises) and keep two variables <code>reg_param</code> and <code>error t/f</code>? I've tried a couple of things and I keep producing something disorganized with implicit variable names in the frame.</p> <p><strong>Edit:</strong> Maybe this is better on <code>Stack Overflow</code>. I'm not exactly sure where this question lies.</p> <pre><code>require("klaR") # SVMlight interface data(iris) # Iris data already present in R error_first50 &lt;- data.frame(row.names = c("reg_param","error")) error_next50 &lt;- data.frame(row.names = c("reg_param","error")) error_last50 &lt;- data.frame(row.names = c("reg_param","error")) for (regcount in (1:100)) { iris_svm_first50 &lt;- svmlight(Species ~ ., data = iris[sample(1:150,50),], svm.options = paste("-c",regcount)) predict_first50 &lt;- predict(iris_svm_first50,iris) for (trial in 1:NROW(predict_first50$class)) { error_first50[regcount] &lt;- predict_first50$class[trial] == iris$Species[trial] } } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. COI can get something that looks all right by labeling the columns manually after the loops end: `colnames(error_first50) <- paste("reg_param =",1:150)` But this is clumsy and unsatisfying, and sacrifices some flexibility. One of the things I like about R is the ability to refer to a field/column by name, while at the same time being able to index it by number. This doesn't really fit into either of those.
      singulars
    2. COI'm a little unsure what you are asking about: is it simply the naming issue? The code works OK? `svmlight` and 'predict` return lists, not data frames (though a data frame is a special case of a list). So there aren't really colnames and rownames for lists: Instead, things are named as `tag = value` or accessed by indexing in a somewhat different way than for data frames. See `?list` and `?"["` If this isn't at all what you are concerned about, please give additional info.
      singulars
    3. COThe code runs and gives meaningful data, but the resulting data frame has rows ranging from 1:150 (comparisons between predicted and dataset values) and columns ranging from 1:100 (columns labeled V1-V100 implicitly representing reg_param). This is MATLAB/C-like behavior that I believe is caused by the way I'm trying to index and organize the data. I'm trying to figure out how to start by designing the data frame in a sensible manner and then populate it while sticking to that design. What I have now isn't doing it.
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload