Saturday, September 14, 2013

Mixed models; Random Coefficients, part 2

Continuing from random coefficients part 1, it is time for the second part. To quote the SAS/STAT manual 'a random coefficients model with error terms that follow a nested structure'. The additional random variable is monthc, which is a factor containing the months and nested under batch. Hence there is one additional statement in generating the data
rc2$Monthc <- factor(rc2$Month)

lme4

Running the analysis in lme4 is not difficult. However, it is noticed the random effect for month:batch provides an estimation problem. Somewhere between the month fixed effect and monthc:batch random effect lme4 has no room left for the month:batch effect. However, it is still needed later on. During simulation full against restricted model the restricted model lacks the month fixed effect and hence the random month:batch effect is not equal to 0. Hence this term has influence on significance of the fixed term.
lmer1 <- lmer(Y ~ Month +(Month-1|Batch) + (1|Batch:Monthc),data=rc2)
summary(lmer1)
Linear mixed model fit by REML 
Formula: Y ~ Month + (Month - 1 | Batch) + (1 | Batch:Monthc) 
   Data: rc2 
   AIC   BIC logLik deviance REMLdev
 287.3 299.4 -138.6    275.2   277.3
Random effects:
 Groups       Name        Variance   Std.Dev.  
 Batch:Monthc (Intercept) 4.1669e+00 2.0413e+00
 Batch        Month       1.1006e-11 3.3175e-06
 Residual                 7.9670e-01 8.9258e-01
Number of obs: 84, groups: Batch:Monthc, 18; Batch, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept) 102.5558     0.7674  133.64
Month        -0.5002     0.1140   -4.39

Correlation of Fixed Effects:
      (Intr)
Month -0.768

Detail on random effects

It is almost the same solution as PROC MIXED when compensated for continuous month effect. For instance, the last number (0.885355535) is close to 12*0.07600+0.002293= 0.914293 from PROC MIXED.

ranef(lmer1)

$`Batch:Monthc`

      (Intercept)
1:0   0.220538869
1:1  -2.582118495
1:3  -2.319238635
1:6   1.880673690
1:9  -1.244284881
1:12  0.772296027
2:0  -0.005588532
2:1  -2.295803984
2:3   2.905996019
2:6   1.642080584
2:9  -2.103224341
2:12 -3.330296971
3:0   1.964950249
3:1  -0.816514780
3:3   0.520042537
3:6   1.236464739
3:9   2.668672370
3:12  0.885355535

$Batch
          Month
1 -2.779928e-11
2 -6.342200e-09
3  6.369999e-09

Fixed effects

Fixed effects are easy after all these exercises. Again we see a discrepancy between anova and simulation, with simulation (p=0.033) making the month effect just significant and anova making it highly significant.
lmer1b <- lmer(Y ~ 1 +(Month-1|Batch)+ (1|Batch:Monthc) ,data=rc2)
anova(lmer1,lmer1b)
Data: rc2
Models:
lmer1b: Y ~ 1 + (Month - 1 | Batch) + (1 | Batch:Monthc)
lmer1: Y ~ Month + (Month - 1 | Batch) + (1 | Batch:Monthc)
       Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)   
lmer1b  4 290.91 300.63 -141.45                            
lmer1   5 285.18 297.33 -137.59 7.7251      1   0.005446 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pboot <- function(m0,m1) {
  s <- simulate(m0)
  L0 <- logLik(refit(m0,s))
  L1 <- logLik(refit(m1,s))
  2*(L1-L0)
}
obsdev <- 2*( logLik(lmer1)-logLik(lmer1b))
#
set.seed(1001)
reps <- replicate(1000,pboot(lmer1b,lmer1))
mean(reps>obsdev)
[1] 0.033

nlme

It is not obvious how to run this model in nlme. It appears (r-help post) that one can make a groupedData object. The random effect in this object gets used in the model, even though not explicitly in the model call. Then there is some trickery in using pdBlocked. However, the output is very straightforward. The month effect is highly significant.
rc3 <- groupedData(formula=Y ~ Month | Batch,
    data=rc2)
lme1 <- lme(Y ~ Month,
    random= pdBlocked(list(pdIdent(~ Monthc - 1),pdIdent(~ Month - 1))),
    data=rc3)
summary(lme1)
Linear mixed-effects model fit by REML
 Data: rc3 
      AIC      BIC    logLik
  286.901 298.9346 -138.4505

Random effects:
 Composite Structure: Blocked

 Block 1: Monthc0, Monthc1, Monthc3, Monthc6, Monthc9, Monthc12
 Formula: ~Monthc - 1 | Batch
 Structure: Multiple of an Identity
         Monthc0  Monthc1  Monthc3  Monthc6  Monthc9 Monthc12
StdDev: 1.934191 1.934191 1.934191 1.934191 1.934191 1.934191

 Block 2: Month
 Formula: ~Month - 1 | Batch
           Month  Residual
StdDev: 0.111472 0.8926711

Fixed effects: Y ~ Month 
                Value Std.Error DF   t-value p-value
(Intercept) 102.55643 0.7287180 80 140.73542   0e+00
Month        -0.50034 0.1259348 80  -3.97302   2e-04
 Correlation: 
      (Intr)
Month -0.66 

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.0466072 -0.6347078  0.0557648  0.4304314  2.4608117 

Number of Observations: 84
Number of Groups: 3 
Conversion of standard deviations to variances for checking:
c(1.934191,0.111472, 0.8926711)^2
[1] 3.74109482 0.01242601 0.79686169

Details on Random effects

Exactly the same as in PROC MIXED.
ranef(lme1)
       Monthc0    Monthc1    Monthc3   Monthc6   Monthc9     Monthc12
1  0.219126119 -2.5690021 -2.3067185 1.8726057 -1.235035  0.773610734
2 -0.006207775 -2.2125547  3.1063194 2.0649346 -1.444999 -2.440480024
3  1.957416153 -0.8849589  0.3005989 0.7971893  2.005861  0.002293605
          Month
1 -0.0002840204
2 -0.0757124467
3  0.0759964671

MCMCglmm

For once this seems most easy. Notice that monthc is not used in the model. Absence of us() around month is sufficient for MCMCglm to interpret it as levels rather than continuous. By now I am not surprised by a mismatch between random effects between PROC MIXED and MCMCglmm. The month effect is significant. 
library(MCMCglmm)
MCMCglmm1 <- MCMCglmm(Y ~ Month, 
    random= ~ us(Month):Batch + Month:Batch, # two terms
    pr=TRUE,
    data=rc2,family='gaussian',
    nitt=1000000,thin=20,burnin=100000,
    verbose=FALSE)
summary(MCMCglmm1)

 Iterations = 100001:999981
 Thinning interval  = 20
 Sample size  = 45000 

 DIC: 238.8215 

 G-structure:  ~us(Month):Batch

                  post.mean  l-95% CI u-95% CI eff.samp
Month:Month.Batch   0.02221 1.386e-17  0.03053    45000

               ~Month:Batch

            post.mean l-95% CI u-95% CI eff.samp
Month:Batch     4.695    1.741    8.649    45000

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units    0.8223   0.5518    1.114    45000

 Location effects: Y ~ Month 

            post.mean l-95% CI u-95% CI eff.samp   pMCMC    
(Intercept)  102.5577 100.9234 104.1365    45961 < 2e-05 ***
Month         -0.4994  -0.7563  -0.2401    45000 0.00467 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Details on random effects.

Again a small calculation for the last item shows 0.737419049+12*0.011875820= 0.8799289 against 0.91 and 0.88 for PROX MIXED and lme4 respectively. The random effects are quire close.

colMeans(MCMCglmm1$Sol)

        (Intercept)               Month Batch.Month.Batch.1 Batch.Month.Batch.2 

      102.557650677        -0.499353562        -0.000757751        -0.013238892 
Batch.Month.Batch.3     Month:Batch.0.1     Month:Batch.1.1     Month:Batch.3.1 
        0.011875820         0.218504027        -2.577571780        -2.317749883 
    Month:Batch.6.1     Month:Batch.9.1    Month:Batch.12.1     Month:Batch.0.2 
        1.873068384        -1.245499371         0.768104528        -0.008015210 
    Month:Batch.1.2     Month:Batch.3.2     Month:Batch.6.2     Month:Batch.9.2 
       -2.278469327         2.929929878         1.705485501        -1.995548530 
   Month:Batch.12.2     Month:Batch.0.3     Month:Batch.1.3     Month:Batch.3.3 
       -3.183017853         1.960380450        -0.828147383         0.483852028 
    Month:Batch.6.3     Month:Batch.9.3    Month:Batch.12.3 
        1.158358351         2.551876100         0.737419049 

Difference in fixed effects between all functions

All functions now make the fixed month effect at approximately -0.5. All have it significant at the conventional 5% level. There are still quite some differences. PROC MIXED and lme4 simulations have p-values close to 5%. Lme4 anova and MCMCglmm have the p-value in the 0.005 region. nlme has 0.0002. This is worrying; the moment I am performing such calculations with real data I need to be more certain on my p-values than implied here.

2 comments:

  1. In case you (or your readers) have not seen it, the book _Linear Mixed Models: A Practical Guide Using Statistical Software_ does an awesome job describing how to fit mixed models in SAS, SPSS<, Stata, R, and HLM. In a series of case studies, they fit typical models, compare the results of each software package, and explain any differences in the output between the software. The web site is http://www-personal.umich.edu/~bwest/almmussp.html

    ReplyDelete
    Replies
    1. That looks certainly like a great resource. I never heard of it, but then, there is so much published, who can cope?

      Thanks for the advise.

      Delete