Note that there are some explanatory texts on larger screens.

plurals
  1. POR Example - ddply, ave, and merge
    primarykey
    data
    text
    <p>I have written a code. It would be great if you guys can suggest better way of doing the stuff I am trying to do. The dt is given as follows:</p> <pre><code> SIC FYEAR AU AT 1 1 2003 6 212.748 2 1 2003 5 3987.884 3 1 2003 4 100.835 4 1 2003 4 1706.719 5 1 2003 5 9.159 6 1 2003 7 60.069 7 1 2003 5 100.696 8 1 2003 4 113.865 9 1 2003 6 431.552 10 1 2003 7 309.109 ... </code></pre> <p>My job is to create a new column for a given SIC, and FYEAR, the AU which has highest percentage AT and the difference between highest AT and second highest AT will get a value 1, otherwise 0. Here, is my attempt to do the stuff mentioned. </p> <pre><code>a &lt;- ddply(dt,.(SIC,FYEAR),function(x){ddply(x,.(AU),function(x) sum(x$AT))}); SIC FYEAR AU V1 1 1 2003 4 3412.619 2 1 2003 5 13626.241 3 1 2003 6 644.300 4 1 2003 7 1478.633 5 1 2003 9 0.003 6 1 2004 4 3976.242 7 1 2004 5 9383.516 8 1 2004 6 457.023 9 1 2004 7 456.167 10 1 2004 9 238.282 </code></pre> <p>where V1 represnts the sum AT for all the rows for a given AU for a given SIC and FYEAR. Next I do :</p> <pre><code>a$V1 &lt;- ave(a$V1, a$SIC, a$FYEAR, FUN = function(x) x/sum(x)); SIC FYEAR AU V1 1 1 2003 4 1.780949e-01 2 1 2003 5 7.111150e-01 3 1 2003 6 3.362420e-02 4 1 2003 7 7.716568e-02 5 1 2003 9 1.565615e-07 6 1 2004 4 2.740114e-01 7 1 2004 5 6.466382e-01 8 1 2004 6 3.149444e-02 9 1 2004 7 3.143545e-02 10 1 2004 9 1.642052e-02 </code></pre> <p>The column V1 now represents the percentage value for each AU for AT contribution for a given SIC, and FYEAR. Next, </p> <pre><code>a$V2 &lt;- ave(a$V1, a$SIC, a$FYEAR, FUN = function(x) {t&lt;-((sort(x, TRUE))[2]); ifelse((x-t)&gt; 0.1,1,0)}); SIC FYEAR AU V1 V2 1 1 2003 4 1.780949e-01 0 2 1 2003 5 7.111150e-01 1 3 1 2003 6 3.362420e-02 0 4 1 2003 7 7.716568e-02 0 5 1 2003 9 1.565615e-07 0 6 1 2004 4 2.740114e-01 0 7 1 2004 5 6.466382e-01 1 8 1 2004 6 3.149444e-02 0 9 1 2004 7 3.143545e-02 0 10 1 2004 9 1.642052e-02 0 </code></pre> <p>The AU for a given SIC, and FYEAR, which has highest percentage contribution to AT, and f the difference is greater than 10%, the that AU gets 1 else gets 0. </p> <p>Then I merge the result with original data dt. </p> <pre><code>dt &lt;- merge(dt,a,key=c("SIC","FYEAR","AU")); SIC FYEAR AU AT V1 V2 1 1 2003 4 1706.719 1.780949e-01 0 2 1 2003 4 100.835 1.780949e-01 0 3 1 2003 4 113.865 1.780949e-01 0 4 1 2003 4 1491.200 1.780949e-01 0 5 1 2003 5 3987.884 7.111150e-01 1 6 1 2003 5 100.696 7.111150e-01 1 7 1 2003 5 67.502 7.111150e-01 1 8 1 2003 5 9461.000 7.111150e-01 1 9 1 2003 5 9.159 7.111150e-01 1 10 1 2003 6 212.748 3.362420e-02 0 </code></pre> <p>What I did is very cumbersome. Is there a better way to do the same stuff? Thanks.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload