MIGMA: A mixture transition distribution markov model-based software for biological sequence analysis

Markov chains are used in a variety of problems in biological sequence analysis. Higher order models are richer and thus, they are capable of producing better results. However, the number of freely estimated parameters grows exponentially with the order of the model. Therefore, in many applications a higher order model may not be estimated adequately. Approximations of a higher-order Markov chain, such as the interpolated Markov models (IMMs) or the variable length Markov chains (VLMCs), have also been proposed and applied in different settings. The Mixture Transition Distribution Markov (MTD) model is another method for approximation of a higher-order Markov chain, in which the transition probabilities are approximated using a linear combination of the single-step transition probabilities. We report here, the application of an MTD model in biological sequence analysis. For parameter estimation we use a version of the EM algorithm. We apply the method in two different problems, protein domain recognition and prediction of bacterial outer membrane proteins, and we show that this simple method can be a powerful competitor even to the sophisticated top-scoring methods. Other potential applications are discussed.

For the latest version of the source code, see here

Other similar tools for biological sequence analysis based on variable length Markov chains:

A webserver implementing the method in order to discriminate bacterial beta-barrel outer membrane proteins from soluble proteins, can be found below:

Pantelis Bagos,
Jul 17, 2016, 4:47 AM
Pantelis Bagos,
Jul 17, 2016, 4:49 AM