Anomalie dans test post GLMM via glmer

par Camille Gauliard Lun 4 Avr 2016 - 17:12

Bonjour tout le monde,
je pose le contexte :
j'ai toutes les conditions pour faire un glm() mais ayant des variables explicatives imbriquées les unes dans les autres on me conseil une approche glmm pour intégrer des random-effects; jusqu'ici tout va bien .
variable réponse : nombre d'individus
facteur 1: stations (14 en tout)
facteur 2 : des habitats (4 qui regroupe les 14 stations)
facteur 3 : les années ( 3 ans)

Code:: > glm_taxa=read.table(file.choose(),sep=";",header=T) > glm_taxa=glm_taxa[complete.cases(glm_taxa),] > glm_taxa recrues stations habitats annees 1 13 fra FR 2012 2 14 fra FR 2012 (...) 837 4 bz OS 2014 838 7 bz OS 2014 839 5 bz OS 2014 840 3 bz OS 2014 > recrues<-as.numeric(glm_taxa$recrues) > summary(glm_taxa) recrues stations habitats annees Min. : 0.00 ax : 60 BR:116 Min. :2012 1st Qu.: 4.00 bx : 60 FR:234 1st Qu.:2012 Median : 10.00 by : 60 MS:115 Median :2013 Mean : 34.35 frc : 60 OS:356 Mean :2013 3rd Qu.: 24.00 az : 59 3rd Qu.:2014 Max. :697.00 brb : 59 Max. :2014 (Other):463

je partitionne mon raisonnement en deux axes :
(i) y a t'il de la variabilité spatiale?
(ii) y a t'il de la variabilité temporelle?

donc je fais deux modèles avec glmer() - car dans se que j'ai pu lire sur différent média glmmPQL n'est pas la solution la plus pertinente-

Code:: > library(lme4) > data1=glmer(recrues~habitat*station*an+(1|habitat/station/an),family=poisson,data=glm_taxa) > summary(data1) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod] Family: poisson ( log ) Formula: recrues ~ habitat * station * an + (1 | habitat/station/an) Data: glm_taxa AIC BIC logLik deviance df.resid 12248 12460 -6079 12158 776 Scaled residuals: Min 1Q Median 3Q Max -13.8122 -1.6854 -0.4472 1.3480 17.0545 Random effects: Groups Name Variance Std.Dev. an:(station:habitat) (Intercept) 0 0 station:habitat (Intercept) 0 0 habitat (Intercept) 0 0 Number of obs: 821, groups: an:(station:habitat), 42; station:habitat, 14; habitat, 4 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.54553 0.06262 40.65 < 2e-16 *** habitatFR 0.84224 0.07490 11.24 < 2e-16 *** habitatMS -0.30819 0.09767 -3.16 0.00160 ** habitatOS 0.34484 0.08185 4.21 2.52e-05 *** stationay -0.15276 0.07755 -1.97 0.04885 * stationaz -0.32542 0.08139 -4.00 6.38e-05 *** stationbra -1.29277 0.13493 -9.58 < 2e-16 *** (...) stationmsa -0.21580 0.11064 -1.95 0.05112 . an2013 -0.57145 0.10424 -5.48 4.20e-08 *** an2014 -0.25353 0.09613 -2.64 0.00835 ** habitatFR:an2013 -0.15129 0.12735 -1.19 0.23483 habitatMS:an2013 2.21361 0.13235 16.73 < 2e-16 *** habitatOS:an2013 0.29847 0.13150 2.27 0.02323 * habitatFR:an2014 2.06831 0.10592 19.53 < 2e-16 *** habitatMS:an2014 3.29188 0.12300 26.76 < 2e-16 *** habitatOS:an2014 -0.06423 0.12584 -0.51 0.60975 stationay:an2013 -0.84472 0.14171 -5.96 2.50e-09 *** stationaz:an2013 -1.82197 0.20377 -8.94 < 2e-16 *** (...) stationbz:an2014 -1.50390 0.15432 -9.75 < 2e-16 *** stationfra:an2014 -0.79819 0.07583 -10.53 < 2e-16 *** stationfrb:an2014 1.28715 0.07057 18.24 < 2e-16 *** stationfrc:an2014 0.10152 0.06812 1.49 0.13613 stationmsa:an2014 -1.11079 0.11653 -9.53 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation matrix not shown by default, as p = 42 > 20. Use print(x, correlation=TRUE) or vcov(x) if you need it (ii)VARIABILITE TEMPORELLE > dataTEMP=glmer(recrues~an+(1|habitat/station/an), family=poisson,data=glm_taxa) > summary(dataTEMP) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod] Family: poisson ( log ) Formula: recrues ~ an + (1 | habitat/station/an) Data: glm_taxa AIC BIC logLik deviance df.resid 12448.4 12476.7 -6218.2 12436.4 815 Scaled residuals: Min 1Q Median 3Q Max -13.8079 -1.6858 -0.4614 1.3277 17.0650 Random effects: Groups Name Variance Std.Dev. an:(station:habitat) (Intercept) 8.744e-01 0.9351100 station:habitat (Intercept) 3.880e-10 0.0000197 habitat (Intercept) 3.509e-01 0.5923501 Number of obs: 821, groups: an:(station:habitat), 42; station:habitat, 14; habitat, 4 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.7640 0.3932 7.030 2.07e-12 *** an2013 -0.5506 0.3546 -1.553 0.121 an2014 0.5037 0.3544 1.421 0.155 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) an2013 an2013 -0.450 an2014 -0.450 0.499

Mon problème se trouve lorsque je veux avoir le détail des variabilités et donc des différences significatives; je procède à un test de TUKEY HSD via la fonction glht() :

Code:: > library(multcomp) > summary(glht(data1, linfct = mcp(habitat = "Tukey"))) Fit: glmer(formula = recrues ~ habitat * station * an + (1 | habitat/station/an), data = glm_taxa, family = poisson) Linear Hypotheses: Estimate Std. Error z value Pr(>|z|) FR - BR == 0 0.84224 0.07490 11.244 < 0.001 *** MS - BR == 0 -0.30819 0.09767 -3.155 0.00874 ** OS - BR == 0 0.34484 0.08185 4.213 < 0.001 *** MS - FR == 0 -1.15043 0.08548 -13.458 < 0.001 *** OS - FR == 0 -0.49740 0.06684 -7.442 < 0.001 *** OS - MS == 0 0.65303 0.09163 7.127 < 0.001 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method) Warning message: In mcp2matrix(model, linfct = linfct) : covariate interactions found -- default contrast might be inappropriate > summary(glht(data1, linfct = mcp(an = "Tukey"))) Fit: glmer(formula = recrues ~ habitat * station * an + (1 | habitat/station/an), data = glm_taxa, family = poisson) Linear Hypotheses: Estimate Std. Error z value Pr(>|z|) 2013 - 2012 == 0 -0.57145 0.10424 -5.482 <0.001 *** 2014 - 2012 == 0 -0.25353 0.09613 -2.637 0.0226 * 2014 - 2013 == 0 0.31792 0.11074 2.871 0.0113 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method)

Quand je fais ce test je trouve donc que tous les habitats sont significatifs => variabilité spatiale
et aussi que toutes les années sont significatives => variabilité temporelle
en faisant cela sur mon premier modèle datat1.

Mais quand je le fais sur mon modèle 2 (dataTEMP) je trouve :

Code:: > summary(glht(dataTEMP, linfct = mcp(an = "Tukey"))) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: glmer(formula = recrues ~ an + (1 | habitat/station/an), data = glm_taxa, family = poisson) Linear Hypotheses: Estimate Std. Error z value Pr(>|z|) 2013 - 2012 == 0 -0.5506 0.3546 -1.553 0.2664 2014 - 2012 == 0 0.5037 0.3544 1.421 0.3298 2014 - 2013 == 0 1.0543 0.3549 2.971 0.0086 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method)

Ici une seule différence significative....

je ne comprends pas d'où vient la coquille... (idem quand je fais un modele pour les stations => glmer(recrues~station ...ect) ; aucunes n'est significative alors que les habitats - qui regroupent les stations- sont significatifs... je suis perdue)

désolée pour la taille du message, mais je préfère mettre toutes les infos pour aider à la compréhension de mon problème qui doit surement venir de mon résonnement .

Camille

par Florent Aubry Mar 5 Avr 2016 - 9:38

Camille

Première remarque concernant le résultat data1. Quand je lis :

Random effects:
Groups Name Variance Std.Dev.
an:(station:habitat) (Intercept) 0 0
station:habitat (Intercept) 0 0
habitat (Intercept) 0 0

je me pose des questions sur la pertinence du modèle. En effet, il est impossible qu'un effet aléatoire soit de variance nulle car cela signifie qu'il n'est pas aléatoire. Donc, il me semble que glmer pourrait alors fonctionner comme glm.

En ce qui concerne dataTEMP, tu as comme résultats pour l'effet aléatoire :

Random effects:
Groups Name Variance Std.Dev.
an:(station:habitat) (Intercept) 8.744e-01 0.9351100
station:habitat (Intercept) 3.880e-10 0.0000197
habitat (Intercept) 3.509e-01 0.5923501

et donc réellement des effets aléatoires.

Il y a donc en premier lieu un problème de modélisation et je ne sais pas si glmer fonctionne correctement même si la procédure ne générère pas d'erreur, en l'absence d'effets aléatoires.

par Camille Gauliard Mar 5 Avr 2016 - 10:04

Bonjour,

d'accord, j'avais remarqué cette "bizarrerie" pour mon premier modèle mais je ne savais pas comment l'interpréter.
Ce premier modèle à pour but réel de me donner les échelles spatiales qui ont un effet significatif sur ma variable (soit habitat, soit station) pour que je me concentre sur une échelle unique afin de faciliter la traitement des données.

D'après mon jeu de données la pertinence irait vers l’échelle station mais il reste à le démontrer.

Florent Aubry a écrit:il est impossible qu'un effet aléatoire soit de variance nulle car cela signifie qu'il n'est pas aléatoire. Donc, il me semble que glmer pourrait alors fonctionner comme glm.

Oui je comprends mais les stations étant imbriquées dans les habitats qui sont toutes deux regroupées par les années, j'ai peur qu'il y ai de la perte d'information via un simple GLM.

Ou alors y a t'il une façon d’insérer dans un modèle (sous forme de glm) la hierarchisation des facteurs sans pour autant les considérer comme aléatoires?

Merci pour votre aide Smile

par Contenu sponsorisé

Anomalie dans test post GLMM via glmer

Anomalie dans test post GLMM via glmer

Re: Anomalie dans test post GLMM via glmer

Re: Anomalie dans test post GLMM via glmer

Re: Anomalie dans test post GLMM via glmer