Note that there are some explanatory texts on larger screens.

plurals
  1. POR glm standard error estimate differences to SAS PROC GENMOD
    primarykey
    data
    text
    <p>I am converting a SAS PROC GENMOD example into R, using glm in R. The SAS code was:</p> <pre><code>proc genmod data=data0 namelen=30; model boxcoxy=boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ/dist=normal; FREQ REPLICATE_VAR; run; </code></pre> <p>My R code is:</p> <pre><code>parmsg2 &lt;- glm(boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ , data=data0, family=gaussian, weights = REPLICATE_VAR) </code></pre> <p>When I use <code>summary(parmsg2)</code> I get the same coefficient estimates as in SAS, but my standard errors are wildly different.</p> <p>The summary output from SAS is:</p> <pre><code>Name df Estimate StdErr LowerWaldCL UpperWaldCL ChiSq ProbChiSq Intercept 1 6.5007436 .00078884 6.4991975 6.5022897 67911982 0 agegrp4 1 .64607262 .00105425 .64400633 .64813891 375556.79 0 agegrp5 1 .4191395 .00089722 .41738099 .42089802 218233.76 0 agegrp6 1 -.22518765 .00083118 -.22681672 -.22355857 73401.113 0 agegrp7 1 -1.7445189 .00087569 -1.7462352 -1.7428026 3968762.2 0 agegrp8 1 -2.2908855 .00109766 -2.2930369 -2.2887342 4355849.4 0 race1 1 -.13454883 .00080672 -.13612997 -.13296769 27817.29 0 race3 1 -.20607036 .00070966 -.20746127 -.20467944 84319.131 0 weekend 1 .0327884 .00044731 .0319117 .03366511 5373.1931 0 seq2 1 -.47509583 .00047337 -.47602363 -.47416804 1007291.3 0 Scale 1 2.9328613 .00015586 2.9325559 2.9331668 -127 </code></pre> <p>The summary output from R is:</p> <pre><code>Coefficients: Estimate Std. Error t value Pr(&gt;|t|) (Intercept) 6.50074 0.10354 62.785 &lt; 2e-16 AGEGRP4 0.64607 0.13838 4.669 3.07e-06 AGEGRP5 0.41914 0.11776 3.559 0.000374 AGEGRP6 -0.22519 0.10910 -2.064 0.039031 AGEGRP7 -1.74452 0.11494 -15.178 &lt; 2e-16 AGEGRP8 -2.29089 0.14407 -15.901 &lt; 2e-16 RACE1 -0.13455 0.10589 -1.271 0.203865 RACE3 -0.20607 0.09315 -2.212 0.026967 WEEKEND 0.03279 0.05871 0.558 0.576535 SEQ -0.47510 0.06213 -7.646 2.25e-14 </code></pre> <p>The importance of the difference in the standard errors is that the SAS coefficients are all statistically significant, but the <code>RACE1</code> and <code>WEEKEND</code> coefficients in the R output are not. I have found a formula to calculate the Wald confidence intervals in R, but this is pointless given the difference in the standard errors, as I will not get the same results.</p> <p>Apparently SAS uses a ridge-stabilized Newton-Raphson algorithm for its estimates, which are ML. The information I read about the <code>glm</code> function in R is that the results should be equivalent to ML. What can I do to change my estimation procedure in R so that I get the equivalent coefficents and standard error estimates that were produced in SAS?</p> <p>To update, thanks to Spacedman's answer, I used weights because the data are from individuals in a dietary survey, and <code>REPLICATE_VAR</code> is a balanced repeated replication weight, that is an integer (and quite large, in the order of 1000s or 10000s). The website that describes the weight is <a href="http://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info4.htm" rel="nofollow">here</a>. I don't know why the <code>FREQ</code> rather than the <code>WEIGHT</code> command was used in SAS. I will now test by expanding the number of observations using REPLICATE_VAR and rerunning the analysis.</p> <p>Thanks to Ben's answer below, the code I am using now is:</p> <pre><code>parmsg2 &lt;- coef(summary(glm(boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ , data=data0, family=gaussian, weights = REPLICATE_VAR))) #clean up the standard errors parmsg2[,"Std. Error"] &lt;- parmsg2[,"Std. Error"]/sqrt(mean(data0$REPLICATE_VAR)) parmsg2[,"t value"] &lt;- parmsg2[,"Estimate"]/parmsg2[,"Std. Error"] #note: using the t-distribution for p-values, correct the t-values allsummary &lt;- summary.glm(glm(boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ , data=data0, family=gaussian, weights = REPLICATE_VAR)) parmsg2[,"Pr(&gt;|t|)"] &lt;- 2*pt(-abs(parmsg2[,"t value"]),df=allsummary$df.resid) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload