IntroductionComputational prediction of signal peptides is of great importance in computational biology. In addition to the general secretory pathway (Sec), Bacteria, Archaea and chloroplasts, possess another major pathway that utilizes the Twin-Arginine translocase (Tat), which recognizes longer and less hydrophobic signal peptides carrying a distinctive pattern of two consecutive Arginines (RR) in the n-region. A major functional differentiation between the Sec and Tat export pathways lies in the fact that the former translocates secreted proteins unfolded through a protein-conducting channel, whereas the latter, translocates completely folded proteins using an unknown mechanism. The purpose of this work is to develop a novel method for predicting and discriminating Sec from Tat signal peptides at better accuracy. We report the development of a novel method, PRED-TAT, which is capable of discriminating Sec from Tat signal peptides and predicting their cleavage sites. The method is based on Hidden Markov Models (HMMs) and possesses a modular architecture suitable for both Sec and Tat signal peptides. On an independent test set of experimentally verified Tat signal peptides, PRED-TAT clearly outperforms the previously proposed methods TatP and TATFIND, whereas, when evaluated as a Sec signal peptide predictor compares favorably to top-scoring predictors such as SignalP and Phobius. ![]() TablesTable 1. Results obtained from the Tat predictors in the training set. The MCC is computed by comparing Tat signal peptides vs. non-Tat sequences (Sec signal peptides, cytoplasmic and TM sequences)
* The results concerning PRED-TAT were obtained from the 30-fold cross-validation procedure
Table 2. Results obtained from the Tat predictors in the independent test set. The MCC is computed by comparing Tat signal peptides vs. non-Tat sequences (Sec signal peptides, cytoplasmic and TM sequences)
* The test set contains no similar sequences to those included in the training set Table 3. Results obtained from the Sec predictors in the training set. The MCC is computed by comparing Sec signal peptides vs. non-Sec sequences (cytoplasmic and TM sequences)
* The results concerning PRED-TAT were obtained from the 30-fold cross-validation procedure. The training set includes a significant portion of sequences used to train the remaining predictors. The set does not include Tat signal peptides and in such case, the specificity of the Sec signal peptide predictors would be lower. Table 4. Results obtained from the Sec predictors in the independent test set. The MCC is computed by comparing Sec signal peptides vs. non-Sec sequences (cytoplasmic and TM sequences)
The test set contains no similar sequences to those included in the training set. The set does not include Tat signal peptides and in such case, the specificity of the Sec signal peptide predictors would be lower. Supplementary tablesSupplementary Tables with additional results are available here. DatasetsThe datasets on which the method is trained and tested are available here. Genome analysisThe results from the analysis of bacterial genomes are available here. Profile HMMsInstead of using the custom HMM, you can also download the profile Hidden Markov Models (pHMMs) generated by HMMER: ReferencePantelis G. Bagos, Elisanthi P. Nikolaou, Theodore D. Liakopoulos and Konstantinos D. Tsirigos. Combined prediction of Tat and Sec signal peptides with Hidden Markov Models. 2010, Bioinformatics [PDF] [Pubmed] [Google Scholar] |
Tools and Software > PRED-TAT >