Our research is focused on the computational analysis of biological sequences (DNA, RNA and proteins). Our main research interests include prediction of protein structure and function from the primary sequence, development of stochastic models for analyzing biological sequences, large-scale genome analyses, development of methodology for genetic association and gene expression studies.

Prediction of protein structure and function

Protein secondary and tertiary structure is largely determined by the protein's primary sequence. Thus, one of the earlier goals of computational biology was to develop methods for predicting the secondary and tertiary structure of proteins using information contained in their primary sequence. For some particular classes of proteins, such as the transmembrane proteins, the prediction is even more important, since these proteins are difficult to be studied by experimental means (i.e. X-Ray crystallography). One of the main research interests of our lab is the development of computational methods for predicting the structure and function of membrane proteins. In particular, we are involved in:

Machine learning algorithms and probabilistic models for biological sequences

Machine learning constitutes a large class of algorithms and computational techniques that enable us to recognize complex patterns and make decisions based (usually) on learning from processing large amounts of data (usually labeled). Computational biology and Bioinformatics use extensively machine learning algorithms due to the complexity of the underlying biological systems that are studied. We are mainly involved in studying a particular class of machine learning algorithms namely the Hidden Markov Models as well as other, related, Markovian models for applications in biological data. Markovian models are suitable for analyzing biological sequences since they recognize the sequential nature of such data. In particular, we are involved in:
  • Development of maximum likelihood and conditional maximum likelihood training algorithms for Hidden Markov Models
  • Development of decoding (recognition) algorithms for Hidden Markov Models
  • Development of maximum likelihood parameter estimation algorithms for other classes of Markovian models, especially for higher-order Markov chain models
  • Development of hybrid methods (i.e. hybrid of Hidden Markov Models and Neural Networks)
  • Development of semi-supervised training algorithms (i.e. utilizing both labeled and unlabeled data)
  • Applications in various problems of analyzing biological sequences (DNA, RNA and proteins)

Genetic Epidemiology

The rapidly developing field of genetic epidemiology, which is the fusion of traditional genetics and epidemiology, studies the genetic elements of diseases as well as the joint effects of genetic factors and environmental determinants in large populations. Whereas traditional genetic studies (i.e. linkage studies, segregation studies) are usually used to identify major determinants of monogenic diseases which are caused by rare variants, modern genetic-association studies are involved in deciphering the role played by a large number of common genetic variants in the development of common (multifactorial) diseases (i.e. diabetes, heart disease, cancer). We are involved in both applied and methodological research in the area, especially implicated in meta-analysis of genetic-association studies. The continuously increasing number of published genetic association studies, has made imperative the need for collecting and synthesizing the available information for a particular gene-disease association providing a quantitative overall estimate in a procedure known as meta-analysis.In particular, we are involved in: