Statistical Methods for analysis and meta-analysis of microarrays


Microarray approaches are widely used high-throughput techniques to assess simultaneously the expression of thousands of genes under certain conditions and study the effects of certain treatments, diseases, and developmental stages. The traditional way to perform such experiments is to design oligonucleotide hybridization probes that correspond to specific genes and then measure the expression of the genes in order to determine which of them are up- or down-regulated compared to a condition that is used as a control. Hitherto, individual experiments cannot capture the bigger picture of how a biological system works and, therefore, data integration from multiple experimental studies and external data repositories is necessary to understand the function of genes and their expression patterns under certain conditions. Therefore, the development of methods for handling, integrating, comparing, interpreting and visualizing microarray data is necessary. The selection of an appropriate method for analyzing microarray datasets is not an easy task. Here, we provide an overview of the various methods developed for microarray data analysis, as well as suggestions for choosing the appropriate method for microarray meta-analysis.

Methods of analysis

Parametric tests, so called t-tests, compare groups, usually two, at the same time. The t-test assesses whether the means of two groups have statistically significant differences. This analysis is used to compare the means of two groups, and especially to analyse the two-group posttest-only randomized experimental design. A drawback of the t-test in microarray data analysis is that most microarray experiments contain only a few samples in each group and the assumption of normality does not hold. Thus, several alternatives to the t-test have been proposed in the literature.

  • Bootstrap and permutation

    • Bootstrap and permutation methods are popular resampling methods readily available in major statistical packages like Stata and R. There are various implementations of the Bootstrap available in Stata (bootstrap command) and in R (boot command). Permutation can be also performed using the permute and permtest (for paired observations) commands in Stata, as well as the perm command in R. Below we give examples of performing bootstrapping and permutation t-test is Stata.
  • Bayesian methods

    • The Bayes Factor method of Rouder and coworkers, which is known as the Jeffreys–Zellner–Siow (JZS) t-test, is available as a web-calculator (, as well as an R package (
    • The Savage–Dickey (SD) t-test, proposed by Wetzels and coworkers is inspired by the JZS t-test and retains its key concepts. It is, however, applicable to a wider range of statistical problems, since it allows researchers to test order restrictions and applies to two-sample situations with unequal variance. The SD t-test is also implemented into an R package that uses WinBUGS (
    • The BEST (Bayesian Estimation Supersedes the t-test) software package, which provides a Bayesian alternative to t-test, providing much richer information than a simple p-value, such as complete distributions of credible values for the effect size, difference of mean between groups, difference of standard deviations, and the normality of the data within the groups. The BEST package is implemented in R (, and is also available as an online calculator ( Moreover, the BEST method is implemented in the Bayesian First Aid package ( that aims to provide user friendly Bayesian alternatives to the most widely used estimation commands
  • Regularized t-test

Methods of meta-analysis

Meta-analysis is the statistical technique for combining data from multiple independent but related studies. In particular, meta-analysis can be used to identify a treatment effect that is consistent among studies. In case the treatment effect varies among studies, meta-analysis may be used to identify the cause for this variation. Hypotheses cannot be inferred and validated based solely on the results of a single study, as the results typically vary between studies; instead, data across studies should be combined. Here, we focus on the two-class comparison of microarrays where the objective is to identify genes expressed differentially between two different conditions. In such cases as this, there are three broad categories of statistical methods for meta-analysis that make use of effect sizes, p-values and ranks
  • Effect Size

    • The first statistical method is a standard approach for meta-analysis using fixed or random effects. In principle, any suitable effect size can be used in meta-analysis; in practice, however, most authors advocate the standardized mean difference. In order to handle the problem of small sample size and non-normal data, most authors suggest a type of correction for calculating the statistical significance. Therefore, instead of relying on the normal approximation included in the standard methods, they propose the permutation test.Although Choi and coworkers suggest permutations to calculate p-values, a faster solution is offered in their Bioconductor package GeneMeta, which assumes a normal distribution on the z-scores after checking the reliability of this hypothesis by a Q–Q plot. In general, all the aforementioned resampling methods can be used, with bootstrapping being, probably, the most advantageous since it requires a smaller number of replications. The bootstrap or the permutation methods can also be used in different settings. One option would be to perform an analysis for each study separately, obtain a corrected estimate of variance and then use this, with standard software, in order to calculate the weights for the meta-analysis. Another option would be to perform the analysis in a single step using the resampling strategy (bootstrap or permutation) in a stratified manner, in which the strata are the studies. Examples in Stata are given below.
    • The aforementioned methods, since they are standard methods for meta-analysis, can be easily extended to a Bayesian framework. The WinBUGS code to fit the models of Conlon and coworkers is available at 
    • Finally, another promising approach is to use the moderated effect sizes calculated by methods such as limma, instead of the typical effect sizes, in the traditional meta-analysis. This is a two-step method relying in the first step on an advanced method for regularized t-test. Then, provided that that t=d√n, a traditional random effects meta-analysis is performed.This approach is implemented in the R package metaMA. The most complete package among them is MetaDE, which also offers functionality for preprocessing the data, as well as for displaying the results graphically.
  • Ranks

    • Another class of methods for meta-analysis consists of methods that combine ranks. There are several different approaches, the common denominator of which is that if the same gene is repeatedly at the top of the list ordered by up- or down-regulated genes in replicate experiments, the gene is more likely to be declared as differentially expressed. The Rank Product method, which we have already described in the context of a single study, uses FC to rank genes and calculates the products of ranks across samples and studies. A similar method, Rank Sum, uses the sum of ranks instead, but all other calculations are identical. The RankProd software is available at:
    • A related method, termed METRADISC (Meta-analysis of Rank Discovery Dataset), is based on the same principle, but it is more general. The ranking within each study is performed with any available method (FC, t-test, p-value etc.) and then the average rank of a particular gene across studies is calculated. The overall mean can be weighted or unweighted; the weighted overall mean resembles the traditional methods for meta-analysis. The between-study heterogeneity of the study-specific ranks can also be computed. METRADISC is implemented in R ( and it is also available as a stand-alone application (
  • P-values

  • Another class of methods that is popular in meta-analysis of microarray studies involves combination of p-values. It is widely accepted that Fisher’s seminal work on the combination of p-values was the origin of meta-analysis. Fisher noted that since p-values from k independent samples are uniform random variables, the sum of their logarithm will follow a χ2 distribution with 2k degrees of freedom. Other popular methods are the methods of Edgington which use the sum of the p-values in order to obtain a pooled estimate. These methods are implemented in the metap command available in Stata and R
  • A more sophisticated method was presented by Zaykin and coworkers, the so-called truncated product method (TPM). Their procedure was to use the product of only those p-values less than a specific cut-off value to evaluate the probability of such a product, or a smaller value, under the overall hypothesis that all k hypotheses are true. Source code for implementing TPM can be obtained from
  • Nevertheless, combining p-values presents serious problems relative to combining effect sizes, as in the case of testing different null hypotheses. Moreover, in the combination of p-value, the direction of the association is not taken into consideration and therefore all p-values have to be one-sided, otherwise up- and down-regulated genes have to be combined separately. Finally, these methods cannot quantify the magnitude of the association (the effect size), and, most importantly, do not allow between-study heterogeneity. A method developed by Stouffer partially overcomes these limitations, by combining the equivalent Z-scores instead of p-value. 

Stata code

Program for performing bootstrap analysis with t-test. The program is needed because we need to recenter the observation in order to have zero mean (i.e. the value under the null hypothesis)

program define ttestboot, rclass
version 10.1
    syntax , x(varlist numeric max=1) type(varlist numeric max=1) [ reps(real 100) var(string )  ]
set more off

di in ye "______________________________________________________________________________"

di in ye "   Calculation of Achieved Significance Level (ASL) using the bootstrap"
di in ye "   The idea is to recenter the two samples to the combined sample mean"
di in ye "   so that the data now conform to the null hypothesis but that the"
di in ye "   variances within the samples remain unchanged"
di in ye "______________________________________________________________________________"

ttest `x',by(`type') uneq
tempname tobs omean
scalar `tobs' = r(t)
qui summarize `x', meanonly
scalar `omean' = r(mean)
qui summarize `x' if `type'==0, meanonly
qui replace `x' = `x' - r(mean) + scalar(`omean') if `type'==0
qui summarize `x' if `type'==1, meanonly
qui replace `x' = `x' - r(mean) + scalar(`omean') if `type'==1
tempfile boot

bootstrap t=r(t),nolegend nowarn notable reps(`reps') strata(`type') saving(`boot'): ttest `x',by(`type') `var'

use `boot',clear
qui generate indicator = abs(t)>=abs(scalar(`tobs'))
qui summarize indicator, meanonly
display in ye "ASLboot = " r(mean)
return scalar p=r(mean)

Program for performing permutation analysis with t-test. This is straightforward because permutation calculates directly the p-value.

permute type t=r(t), reps(1000):   ttest x,by(type) uneq
mat li r(p)
mat p =r(p)
di abs(invnormal(p[1,1]/2))

Program for performing meta-analysis using the standardized difference (d). The metan command cannot be used, because it requires the data collapsed (i.e. it uses only the mean and the standard deviation), and in order to use bootstrap or permutation (see the next program) we need the full dataset.

program define myDL3, rclass
        version 10.1
        syntax , x(varlist numeric max=1) Type(varlist numeric max=1) Study(varlist numeric max=1) [ Effect(string random)  ]


tempvar last
qui bysort `study': gen `last'=_n==_N
qui sum `study'
local max=r(max)
tempvar x0 x1 sd0 sd1 n0 n1 d sed
qui gen double `x0'=.
qui gen double `x1'=.
qui gen double `sd0'=.
qui gen double `sd1'=.
qui gen double `n0'=.
qui gen double `n1'=.
qui gen double `d'=.
qui gen double `sed'=.

forvalues i=1(1)`max' {
   qui  sum `x' if `study'==`i' & `type'==0
   qui  replace `x0'=r(mean) if `study'==`i'
   qui  replace `sd0'=r(sd) if `study'==`i' 
   qui  replace `n0'=r(N) if `study'==`i'
   qui  sum `x' if `study'==`i' & `type'==1
   qui  replace `x1'=r(mean) if `study'==`i'
   qui  replace `sd1'=r(sd) if `study'==`i'
   qui  replace `n1'=r(N) if `study'==`i'
   qui     replace `d'=(`x1' - `x0')/(sqrt( (    (`n1'- 1)*(`sd1')^2 +(`n0'-1)*(`sd0')^2)/( `n1' + `n0' -2  ))) if `study'==`i'
   qui     replace `sed'= sqrt( (`n1' + `n0')/(`n1'*`n0') +(`d')^2/(2*(`n1'+`n0'))) if `study'==`i'

qui keep if `last'
qui keep if `x' !=.

metan `n1' `x1' `sd1' `n0' `x0' `sd0', `effect' nograph
**metan `n1' `x1' `sd1' `n0' `x0' `sd0',fixed nograph hedges
**metan `d' `sed' , `effect' nograph
**metan `d' `sed' , random nograph


return scalar d=r(ES)
return scalar se=r(seES)


Program for performing 1-step bootstrap meta-analysis using d (calls the above-mentioned program "myDL3.ado"). Note that the case-control status (type) and the study variable are treated as strata in the bootstrap.

set more off
file open meta using resultsbootstrap.txt, write append
file write meta "gene"
file write meta ","
file write meta "d"
file write meta ","
file write meta "se" _n

foreach var of varlist a1bg-zfhx3{
    di ""
    di ""
    di in ye "Performing meta-analysis for `var' gene"
    di ""

    bootstrap d=r(d), nowarn nohead reps(100) strata(study type): myDL3,x(`var') type(type) study(study) effect(random)

    mat d=e(b)
    local d=d[1,1]
    mat se=e(se)
    local se=se[1,1]

    file write meta "`var'"
    file write meta ","
    file write meta "`d'"
    file write meta ","
    file write meta "`se'"_n


file close meta

In case one wants to used permutation instead, the program is very similar. Note that in this case only the study variable is used as strata, because the case-control variable is used for permuting the labels.

set more off
file open meta using resultspermute.txt, write append
file write meta "gene"
file write meta "    "
file write meta "d"
file write meta "    "
file write meta "p" _n

foreach var of varlist a1bg-aacs{

    permute typed=r(d), strata(study) reps(1000): myDL3,x(`var') type(type) study(study) effect(random)

    mat d=r(b)
    local d=d[1,1]
    mat p=r(p)
    local p=p[1,1]

file write meta "`var'"
file write meta ","
file write meta "`d'"
file write meta ","
file write meta "`p'"_n


file close meta