Markov chains are used in a variety of
problems in biological sequence analysis. Higher order models are richer and
thus, they are capable of producing better results. However, the number of freely estimated parameters grows exponentially with the order of the model. Therefore,
in many applications a higher order model may not be estimated adequately. Approximations
of a higher-order Markov chain, such as the interpolated Markov models (IMMs)
or the variable length Markov chains (VLMCs), have also been proposed and
applied in different settings. The Mixture Transition Distribution Markov
(MTD) model is another method for approximation of a higher-order Markov chain,
in which the transition probabilities are approximated using a linear combination
of the single-step transition probabilities. We report here, the application of an MTD
model in biological sequence analysis. For parameter estimation we use a
version of the EM algorithm. We apply the method in two different problems,
protein domain recognition and
prediction of bacterial outer membrane proteins, and we show that this simple
method can be a powerful competitor even to the sophisticated top-scoring methods. Other
potential applications are discussed. For the latest version of the source code, see here A webserver implementing the method in order to discriminate bacterial beta-barrel outer membrane proteins from soluble proteins, can be found below: |
Tools and Software >