krotcharge.blogg.se

#STATA GLM HOW TO#

While similar to the previous approach, here only the observed values of the offset for each group are used. This can be duplicated with the predict function as follows. These are very different from our previous results for Stata, so what’s happening here? R by hand First, we create a data frame for prediction using id, get the predictions for all those values, then get mean prediction per group.ģ5 | 141.0984 7.941774 17.77 0.000 125.5328 156.6639 To replicate the Stata output in R, we will use all values of the offset for every level of age, and subsequently get an average prediction for each age group. In this model, we only have the age covariate and the offset, so there really isn’t much to focus on besides the latter. These values, while consistent in pattern, are much different than the emmeans output, so what is going on? R by hand LR test of alpha=0: chibar2(01) = 38.33 Prob >= chibar2 = 0.000 emmeansįirst let’s use emmeans, a very popular package for getting estimated marginal means, to get the predicted counts for each age group.Įxpression : Predicted number of events, predict()

Negative binomial regression Number of obs = 64 Nbreg claims i.age, offset(ln_holders) nolog We get the same result, so this means we can’t get different predictions if we do the same thing in both R or Stata 2. Residual deviance: 67.602 on 60 degrees of freedom Null deviance: 86.761 on 63 degrees of freedom (Dispersion parameter for Negative Binomial(28.4012) family taken to be 1)

MASS::glm.nb(formula = claims ~ age + offset(ln_holders), data = insurance, Nb_glm_offset = MASS::glm.nb(claims ~ age + offset(ln_holders), data = insurance) We do a bit of minor processing, and I save the data as a Stata file in case anyone wants to play with it in that realm. Age: the age of the insured in 4 groups labelled 35.Group: group of car with levels 2 litre.District: district of residence of policyholder (1 to 4): 4 is major cities.The data given in data frame Insurance consist of the numbers of policyholders of an insurance company who were exposed to risk, and the numbers of car insurance claims made by those policyholders. We will use the Insurance data from the MASS package which most with R will have access to. So even when the models are identical, marginal estimates might be different in R and Stata.

The Stata documentation for the margins command offers no specific details of how the offset/exposure is treated, and some R packages appear not to know what to do with it, or offer few options to deal with it.

Offsets are commonly used to model rates when the target variable is a count, but are used in other contexts as well. However, here we note the issues that arise when models include an offset. In addition, there are numerous resources for both R and Stata for getting marginal results (i.e. predictions). Likewise, some in the R world catch a whiff of Stata’s margins and would want something similar, but may not be sure where to turn.Ī little digging will reveal there are several packages that will provide the same sort of thing.

#STATA GLM HOW TO#

For those in the Stata world, they typically use margins for this, but when they come to R, there is no obvious option for how to go about it in the same way 1. Getting predictions in R is and always has been pretty easy for the vast majority of packages providing modeling functions, as they also provide a predict method for the model objects.