StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POR glm standard error estimate differences to SAS PROC GENMOD
primarykey
Id
8289612
data
AcceptedAnswerId
8290686
AnswerCount
2
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2011-11-27T22:25:55.537
FavoriteCount
2
LastActivityDate
2014-04-01T22:55:49.950
LastEditDate
2011-11-29T20:56:45
LastEditorUserId
1030648
OwnerUserId
1030648
ParentId
0
PostTypeId
1
Score
10
ViewCount
2786
LastEditorDisplayName
text
Body
I am converting a SAS PROC GENMOD example into R, using glm in R. The SAS code was: <pre><code>proc genmod data=data0 namelen=30; model boxcoxy=boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ/dist=normal; FREQ REPLICATE_VAR; run; </code></pre> My R code is: <pre><code>parmsg2 <- glm(boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ , data=data0, family=gaussian, weights = REPLICATE_VAR) </code></pre> When I use <code>summary(parmsg2)</code> I get the same coefficient estimates as in SAS, but my standard errors are wildly different. The summary output from SAS is: <pre><code>Name df Estimate StdErr LowerWaldCL UpperWaldCL ChiSq ProbChiSq Intercept 1 6.5007436 .00078884 6.4991975 6.5022897 67911982 0 agegrp4 1 .64607262 .00105425 .64400633 .64813891 375556.79 0 agegrp5 1 .4191395 .00089722 .41738099 .42089802 218233.76 0 agegrp6 1 -.22518765 .00083118 -.22681672 -.22355857 73401.113 0 agegrp7 1 -1.7445189 .00087569 -1.7462352 -1.7428026 3968762.2 0 agegrp8 1 -2.2908855 .00109766 -2.2930369 -2.2887342 4355849.4 0 race1 1 -.13454883 .00080672 -.13612997 -.13296769 27817.29 0 race3 1 -.20607036 .00070966 -.20746127 -.20467944 84319.131 0 weekend 1 .0327884 .00044731 .0319117 .03366511 5373.1931 0 seq2 1 -.47509583 .00047337 -.47602363 -.47416804 1007291.3 0 Scale 1 2.9328613 .00015586 2.9325559 2.9331668 -127 </code></pre> The summary output from R is: <pre><code>Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.50074 0.10354 62.785 < 2e-16 AGEGRP4 0.64607 0.13838 4.669 3.07e-06 AGEGRP5 0.41914 0.11776 3.559 0.000374 AGEGRP6 -0.22519 0.10910 -2.064 0.039031 AGEGRP7 -1.74452 0.11494 -15.178 < 2e-16 AGEGRP8 -2.29089 0.14407 -15.901 < 2e-16 RACE1 -0.13455 0.10589 -1.271 0.203865 RACE3 -0.20607 0.09315 -2.212 0.026967 WEEKEND 0.03279 0.05871 0.558 0.576535 SEQ -0.47510 0.06213 -7.646 2.25e-14 </code></pre> The importance of the difference in the standard errors is that the SAS coefficients are all statistically significant, but the <code>RACE1</code> and <code>WEEKEND</code> coefficients in the R output are not. I have found a formula to calculate the Wald confidence intervals in R, but this is pointless given the difference in the standard errors, as I will not get the same results. Apparently SAS uses a ridge-stabilized Newton-Raphson algorithm for its estimates, which are ML. The information I read about the <code>glm</code> function in R is that the results should be equivalent to ML. What can I do to change my estimation procedure in R so that I get the equivalent coefficents and standard error estimates that were produced in SAS? To update, thanks to Spacedman's answer, I used weights because the data are from individuals in a dietary survey, and <code>REPLICATE_VAR</code> is a balanced repeated replication weight, that is an integer (and quite large, in the order of 1000s or 10000s). The website that describes the weight is <a href="http://www.cdc.gov/nchs/tutorials/dietary/Advanced/ModelUsualIntake/Info4.htm" rel="nofollow">here</a>. I don't know why the <code>FREQ</code> rather than the <code>WEIGHT</code> command was used in SAS. I will now test by expanding the number of observations using REPLICATE_VAR and rerunning the analysis. Thanks to Ben's answer below, the code I am using now is: <pre><code>parmsg2 <- coef(summary(glm(boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ , data=data0, family=gaussian, weights = REPLICATE_VAR))) #clean up the standard errors parmsg2[,"Std. Error"] <- parmsg2[,"Std. Error"]/sqrt(mean(data0$REPLICATE_VAR)) parmsg2[,"t value"] <- parmsg2[,"Estimate"]/parmsg2[,"Std. Error"] #note: using the t-distribution for p-values, correct the t-values allsummary <- summary.glm(glm(boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ , data=data0, family=gaussian, weights = REPLICATE_VAR)) parmsg2[,"Pr(>|t|)"] <- 2*pt(-abs(parmsg2[,"t value"]),df=allsummary$df.resid) </code></pre>
Tags
<r><sas><glm>
Title
R glm standard error estimate differences to SAS PROC GENMOD
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USMichelle
UserOwnerUserId
1. USMichelle
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POR glm standard error estimate differences to SAS PROC GENMOD
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POR glm standard error estimate differences to SAS PROC GENMOD
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POR glm standard error estimate differences to SAS PROC GENMOD
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COYour R standard errors are all about 131.25 times the SAS standard errors, if that's at all enlightening. For more enlightenment I'd simplify the problem to one variable and see what happens.
 singulars
 PostPostId
 POR glm standard error estimate differences to SAS PROC GENMOD
 UserUserId
 USSpacedman

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.