A unification of multivariate
methods for meta-analysis of genetic association studies
Pantelis
G.
Bagos
Methods for multivariate meta-analysis of genetic
association studies are reviewed, summarized and presented in a
unified framework. Modifications of standard models are described in
detail in order to be applied in genetic association studies. The
model based on summary data is uniformly defined for both discrete
and continuous outcomes and analytical expressions for the covariance
of the two jointly modeled outcomes are derived for both cases. The
models based on the binary nature of the data are fitted using both
prospective and retrospective likelihood. Furthermore, formal tests
for assessing the genetic model of inheritance are developed based on
standard normal theory. The general model is compared to the recently
proposed genetic model-free bivariate approach (either using summary
or binary data), and it is clearly shown that the estimates provided
by this approach are nearly identical to the estimates derived by the
general bivariate model using the aforementioned tests for the
genetic model. The methods developed here as well as the tests, are
easily implemented in all major statistical packages, escaping the
need of self written software. The methods are applied in several
already published meta-analyses of genetic association studies (with
both discrete and continuous outcomes) and the results are compared
against the widely used univariate approach as well as against the
genetic model free approaches. Illustrative examples of code in Stata
are given in the Appendix. It is anticipated that the methods
developed in this work will be widely applied in the meta-analysis of
genetic association studies.
The Stata programs (do-files) for
fitting the models proposed in this work, are available below:
The program for performing bivariate meta-analysis
with discrete-outcome data is here.
**Stata code for fitting the various models described in the manuscript
**The commands can be run interactively or in batch mode (i.e. in a do-file)
**The mvmeta and gllamm packages are required in order to run the code
**Fixed effects models are not considered here, for simplicity
**Data input for the discrete outcome meta-analysis
**Data are taken from the study of Kato et al, 1999
**Data are listed in the form of Table 1 of the manuscript
clear
set more off
input study aa0 ab0 bb0 aa1 ab1 bb1
1 3 34 44 2 20 83
2 4 30 49 3 17 62
3 17 84 48 3 30 31
4 5 32 63 5 23 52
5 12 62 119 8 39 133
6 9 48 47 6 34 68
7 18 134 363 20 214 483
end
** calculation of the log(Odds Ratios)
gen b1=log(( ab1/ aa1)/( ab0/aa0))
gen b2=log(( bb1/ aa1)/( bb0/aa0))
** calculation of the variances
gen V11=1/aa0 +1/ab0+1/aa1 +1/ab1
gen V22=1/aa0 +1/aa1+1/bb0 +1/bb1
** calculation of the covariance
gen V12=1/aa0 +1/aa1
** fitting the model of Equation 3.7 using REML
mvmeta b V,vars(b1 b2)
** retrieve the covariance matrix
matrix v=e(V)
**calculation of lambda
di _b[b1]/_b[b2]
**calculation of standard error of lambda – Equation 5.6
di sqrt( v[1,1]/_b[b2]^2 + v[2,2]*_b[b1]^2/_b[b2]^4 -2*v[1,2]*_b[b1]/_b[b2]^3 )
**automatic calculation of confidence intervals for lambda
nlcom _b[b1]/_b[b2]
**calculation of d
di _b[b2] - _b[b1]
**calculation of standard error of d – Equation 5.2
di sqrt( v[1,1] + v[2,2] - 2*v[1,2] )
**automatic calculation of confidence intervals for d
lincom _b[b2] - _b[b1]
The program for performing bivariate meta-analysis
with continuous-outcome data is here
**Data input for the continuous outcome meta-analysis
**Data are taken from the study of Boekholdt, et al, 2005
**Data are listed in the form of Table 2 of the manuscript
clear
input study gen11 gen12 gen22 x11 sd11 x12 sd12 x22 sd22
1 181 296 88 1.18 .32 1.24 .33 1.33 .36
2 500 896 317 .79 .25 .84 .25 .9 .27
3 328 596 210 1.09 .31 1.12 .29 1.25 .4
4 845 1236 409 1.19 .37 1.22 .37 1.3 .41
5 168 256 100 1.36 .36 1.38 .38 1.48 .42
6 237 380 150 1.13 .21 1.19 .24 1.27 .22
7 284 369 139 1.16 .35 1.2 .39 1.22 .41
8 256 335 115 .89 .21 .92 .21 1.03 .27
9 1121 1693 575 .97 .22 1 .23 1.07 .24
10 500 797 299 1.07 .256 1.13 .24 1.17 .24
end
** calculation of the mean differences
gen b1 = x12 - x11
gen b2 = x22 - x11
** calculation of the variances
gen V11 = (sd12)^2/gen12 + (sd11)^2/gen11
gen V22 = (sd22)^2/gen22 + (sd11)^2/gen11
** calculation of the covariance
gen V12 = (sd11)^2 /gen11
** fitting the model of Equation 3.7 using REML
mvmeta b V,vars(b1 b2)
** retrieve the covariance matrix
matrix v=e(V)
**calculation of lambda
di _b[b1]/_b[b2]
**calculation of standard error of lambda - Equation 5.6
di sqrt( v[1,1]/_b[b2]^2 + v[2,2]*_b[b1]^2/_b[b2]^4 -2*v[1,2]*_b[b1]/_b[b2]^3 )
**automatic calculation of confidence intervals for lambda
nlcom _b[b1]/_b[b2]
**calculation of d
di _b[b2] - _b[b1]
**calculation of standard error of d - Equation 5.2
di sqrt( v[1,1] + v[2,2] - 2*v[1,2] )
**automatic calculation of confidence intervals for d
lincom _b[b2] - _b[b1]
A self-written program for performing bivariate
meta-analysis using the ml command is here
**Stata program for performing bivariate meta-analysis using ML
**The program should be run in batch mode (i.e. in a do-file)
**When there is a small number of studies, the between studies correlation is
**poorly estimated (i.e. -1 or +1), in such a case the program may not reach
**convergence and the user will have to run it again
**Data input for the discrete outcome meta-analysis
**Data are taken from the study of Kato et al, 1999
**Data are listed in the form of Table 1 of the manuscript
clear
set more off
input study aa0 ab0 bb0 aa1 ab1 bb1
1 3 34 44 2 20 83
2 4 30 49 3 17 62
3 17 84 48 3 30 31
4 5 32 63 5 23 52
5 12 62 119 8 39 133
6 9 48 47 6 34 68
7 18 134 363 20 214 483
end
** calculation of the log(Odds Ratios)
gen b1=log(( ab1/ aa1)/( ab0/aa0))
gen b2=log(( bb1/ aa1)/( bb0/aa0))
** calculation of the variances
gen V11=1/aa0 +1/ab0+1/aa1 +1/ab1
gen V22=1/aa0 +1/aa1+1/bb0 +1/bb1
** calculation of the covariance
gen V12=1/aa0 +1/aa1
** initial values for input to the bivariate model are obtained from the
** fixed-effects estimates
gen _wz1=1/V11
gen _wz2=1/V22
gen _z1wz1=b1*_wz1
gen _z2wz2=b2*_wz2
summarize _z1wz1
local muz1=r(sum)
summarize _wz1
local swz1=r(sum)
local muz1=`muz1'/`swz1'
local tauz1=1/`swz1'
summarize _z2wz2
local muz2=r(sum)
summarize _wz2
local swz2=r(sum)
local muz2=`muz2'/`swz2'
local tauz2=1/`swz2'
corr _z1wz1 _z2wz2
local r=r(rho)
local tauz1=log(`tauz1')
local tauz2=log(`tauz2')
local r=0.5*log((1+`r')/(1-`r'))
drop _wz1 _wz2 _z1wz1 _z2wz2
program drop _all
** here is the main program for evaluating the log-likelihood
program define LL
args lnl mub1 mub2 logtb1 logtb2 r
quietly {
scalar taub1=exp(`logtb1')
scalar taub2=exp(`logtb2')
scalar rho=(exp(2*`r')-1)/(exp(2*`r')+1)
gen double vb1=taub1 + V11
gen double vb2=taub2 + V22
gen double corr=(rho*sqrt(taub1*taub2) + V12)/sqrt(vb1*vb2)
gen double L=log(vb1*vb2*(1-corr^2))+ /*
*/ ( (b1-`mub1')^2/vb1 + /*
*/ (b2-`mub2')^2/vb2 - /*
*/ 2*corr*(b1-`mub1')*(b2-`mub2')/sqrt(vb1*vb2)) /*
*/ /(1-corr^2)
replace `lnl'=-0.5*L - 0.5*log(2*_pi)
drop vb1 vb2 corr L
}
end
** model fitting - Equation 3.7
** the results should be identical to running: mvmeta b V,vars(b1 b2) ml
ml model lf LL (b1:) (b2:) (logvar1:) (logvar2:) (z_rho:)
ml init b1:_cons=`muz1' b2:_cons=`muz2' logvar1:_cons=`tauz1' logvar2:_cons=`tauz2' z_rho:_cons=`r'
ml check
ml search,repeat(10)
ml maximize, difficult iter(30)
A gllamm program for fitting meta-analysis models using the prospective likelihood is here
**Stata code for fitting the various models described in the manuscript
**The commands can be run interactively or in batch mode (i.e. in a do-file)
**The mvmeta and gllamm packages are required in order to run the code
**Fixed effects models are not considered here, for simplicity
**Data input for the discrete outcome meta-analysis
**Data are taken from the study of Kato et al, 1999
**Data are listed in the form of Table 1 of the manuscript
clear
set more off
input study aa0 ab0 bb0 aa1 ab1 bb1
1 3 34 44 2 20 83
2 4 30 49 3 17 62
3 17 84 48 3 30 31
4 5 32 63 5 23 52
5 12 62 119 8 39 133
6 9 48 47 6 34 68
7 18 134 363 20 214 483
end
**data re-arrangement in a format suitable for logistic regression
reshape long aa ab bb, i(study) j(group)
gen id = _n
rename aa count1
rename ab count2rename bb count3
reshape long count, i(id) j(gen)
drop id
gen wt1=count
xi i.study i.gen
**fitting the model based on the prospective likelihood – Equation 4.1
eq slope2: _Igen_2
eq slope3: _Igen_3
gllamm group _Igen_2 _Igen_3 _Istudy_*,fam(binom) link(logit) i(study ) nrf(2) eqs(slope2 slope3) w(wt) adapt nip(8)
**automatic calculation of confidence intervals for lambda
nlcom _b[_Igen_2]/_b[_Igen_3]
**automatic calculation of confidence intervals for d
lincom _b[_Igen_2] - _b[_Igen_3]
A gllamm program for fitting meta-analysis models using the
retrospective likelihood is here
**Stata code for fitting the various models described in the manuscript
**The commands can be run interactively or in batch mode (i.e. in a do-file)
**The mvmeta and gllamm packages are required in order to run the code
**Fixed effects models are not considered here, for simplicity
**Data input for the discrete outcome meta-analysis
**Data are taken from the study of Kato et al, 1999
**Data are listed in the form of Table 1 of the manuscript
clear
set more off
input study aa0 ab0 bb0 aa1 ab1 bb1
1 3 34 44 2 20 83
2 4 30 49 3 17 62
3 17 84 48 3 30 31
4 5 32 63 5 23 52
5 12 62 119 8 39 133
6 9 48 47 6 34 68
7 18 134 363 20 214 483
end
**data re-arrangement in a format suitable for logistic regression
reshape long aa ab bb, i(study) j(group)
gen id = _n
rename aa count1
rename ab count2rename bb count3
reshape long count, i(id) j(gen)
drop id
gen wt1=count
**data re-arrangement in a format suitable for polytomous (multinomial) logistic regression
sort study group gen
gen patt=_n
expand 3
sort patt
qui by patt: gen alt=_n
gen chosen=alt==gen
tab alt,gen(a)
gen a2group=a2*group
gen a3group=a3*group
**fitting the model based on the retrospective likelihood – Equation 4.4
eq slope2:a2group
eq slope3:a3group
gllamm alt group _Istudy_* , expand(patt chosen m) i(study) link(mlogit) family(binom) nrf(2) eqs(slope2 slope3) nip(8) weight(wt) adapt
**automatic calculation of confidence intervals for d
lincom [c3]group - [c2]group
**automatic calculation of confidence intervals for lambda
nlcom [c2]group/[c3]group
**alternative parameterization of the retrospective likelihood model
xi i.study i.study*a2 i.study*a3
gllamm alt a2 a3 _IstuXa* a2group a3group, nocons expand(patt chosen o) i(study)
link(mlogit) family(binom) nrf(2) eqs(slope2 slope3 ) nip(8) weight(wt) adapt
**automatic calculation of confidence intervals for lambda
nlcom _b[a2group]/_b[a3group]
**automatic calculation of confidence intervals for d
lincom _b[a3group] -_b[a2group]
A gllamm program for fitting meta-analysis models of continuous outcomes using the
linear mixed model is here
**Data input for the continuous outcome meta-analysis
**Data are taken from the study of Boekholdt, et al, 2005
**Data are listed in the form of Table 2 of the manuscript
clear
input study gen11 gen12 gen22 x11 sd11 x12 sd12 x22 sd22
1 181 296 88 1.18 .32 1.24 .33 1.33 .36
2 500 896 317 .79 .25 .84 .25 .9 .27
3 328 596 210 1.09 .31 1.12 .29 1.25 .4
4 845 1236 409 1.19 .37 1.22 .37 1.3 .41
5 168 256 100 1.36 .36 1.38 .38 1.48 .42
6 237 380 150 1.13 .21 1.19 .24 1.27 .22
7 284 369 139 1.16 .35 1.2 .39 1.22 .41
8 256 335 115 .89 .21 .92 .21 1.03 .27
9 1121 1693 575 .97 .22 1 .23 1.07 .24
10 500 797 299 1.07 .256 1.13 .24 1.17 .24
end
**data re-arrangement in a format suitable for fitting the linear mixed model
rename gen11 gen1
rename gen12 gen2
rename gen22 gen3
rename x11 x1
rename sd11 sd1
rename x12 x2
rename sd12 sd2
rename x22 x3rename sd22 sd3
reshape long gen x sd,i(study ) j(z)
xi i.study i.z
**calculation of the variance
gen v=sd^2/ gen
**gllamm for stability parameterizes the residual variance using the logarithm of
the standard error
gen s=log(sqrt(v))
**setting up one equation for each study
eq wgt: s
**creating constraints needed for gllamm in order to perform inverse-variance weighting
constraint define 1 [lns1]s=1
** fitting the model of Equation 3.10
eq s2: _Iz_2
eq s3: _Iz_3
gllamm x _Iz_2 _Iz_3 _Istudy* , i(study) nrf(2) eqs(s2 s3) s(wgt) constraint(1 ) nip(8) noest
matrix A=M_initf
gllamm x _Iz_2 _Iz_3 _Istudy* , i(study) nrf(2) eqs(s2 s3) s(wgt) constraint(1 ) nip(8) from(A) long adapt
**automatic calculation of confidence intervals for lambda
nlcom _b[ _Iz_2]/_b[ _Iz_3]
**automatic calculation of confidence intervals for d
lincom _b[ _Iz_3] - _b[ _Iz_2]
A self-written program for performing bivariate
meta-analysis under the genetic model-free approach using the ml
command is here
**Stata program for fitting the genetic model-free approach of Minelli et al.
**The program should be run in batch mode (i.e. in a do-file)
**Data input for the discrete outcome meta-analysis
**Data are taken from the study of Kato et al, 1999
**Data are listed in the form of Table 1 of the manuscript
clear
set more off
input study aa0 ab0 bb0 aa1 ab1 bb1
1 3 34 44 2 20 83
2 4 30 49 3 17 62
3 17 84 48 3 30 31
4 5 32 63 5 23 52
5 12 62 119 8 39 133
6 9 48 47 6 34 68
7 18 134 363 20 214 483
end
** calculation of the log(Odds Ratios)
gen b1=log(( ab1/ aa1)/( ab0/aa0))
gen b2=log(( bb1/ aa1)/( bb0/aa0))
** calculation of the variances
gen V11=1/aa0 +1/ab0+1/aa1 +1/ab1
gen V22=1/aa0 +1/aa1+1/bb0 +1/bb1
** calculation of the covariance
gen V12=1/aa0 +1/aa1
** initial values for input to the bivariate model are obtained from the
** fixed-effects estimates
gen _wz1=1/V11
gen _wz2=1/V22
gen _z1wz1=b1*_wz1
gen _z2wz2=b2*_wz2
summarize _z1wz1
local muz1=r(sum)
summarize _wz1
local swz1=r(sum)
local muz1=`muz1'/`swz1'
local tauz1=1/`swz1'
summarize _z2wz2
local muz2=r(sum)
summarize _wz2
local swz2=r(sum)
local muz2=`muz2'/`swz2'
local tauz2=1/`swz2'
local lambda=`muz1'/`muz2'
drop _wz1 _wz2 _z1wz1 _z2wz2
local tauz2=log(`tauz2')
program drop _all
** here is the main program for evaluating the log-likelihood
program define LL
args lnl lambda mub2 logtb2
quietly {
scalar taub2=exp(`logtb2')
scalar mub1=`mub2'*`lambda'
gen double vb1=`lambda'^2*taub2 + V11
gen double vb2=taub2 + V22
gen double cov=`lambda'*taub2 + V12
gen double corr=cov/sqrt(vb2*vb1)
gen double L=log(vb1*vb2*(1-corr^2)) + /*
*/ ( (b1 - mub1 )^2/vb1 + /*
*/ (b2 -`mub2')^2/vb2 - /*
*/ 2*corr*(b1 - mub1)*(b2 - `mub2')/sqrt(vb2*vb1) )/*
*/ /(1-corr^2)
replace `lnl'=-0.5*L - 0.5*log(2*_pi)
drop vb2 vb1 corr cov L
}
end
** model fitting– Equation 3.13
ml model lf LL (lambda:) (b2:) (logvar2:)
ml init lambda:_cons=`lambda' b2:_cons=`muz2' logvar2:_cons=`tauz2'
ml check
ml search, repeat(10)
ml maximize, difficult iter(30)
The programs were developed in Stata
8.0 although they are probably functional with older versions
(6.0 and 7.0).
They require the gllamm
module for fitting the random-effects models.
The methods
implemented here are described in:
Bagos PG. A
unification of multivariate methods for meta-analysis of genetic
association studies. 2008, Statistical Applications in Genetics and
Molecular Biology, 7(1), Article 13 [PDF] [Pubmed] [Google Scholar]