Tools and Software‎ > ‎PRED-TAT‎ > ‎

Supplementary material


Introduction


Computational prediction of signal peptides is of great importance in computational biology. In addition to the general secretory pathway (Sec), Bacteria, Archaea and chloroplasts, possess another major pathway that utilizes the Twin-Arginine translocase (Tat), which recognizes longer and less hydrophobic signal peptides carrying a distinctive pattern of two consecutive Arginines (RR) in the n-region. A major functional differentiation between the Sec and Tat export pathways lies in the fact that the former translocates secreted proteins unfolded through a protein-conducting channel, whereas the latter, translocates completely folded proteins using an unknown mechanism. The purpose of this work is to develop a novel method for predicting and discriminating Sec from Tat signal peptides at better accuracy. We report the development of a novel method, PRED-TAT, which is capable of discriminating Sec from Tat signal peptides and predicting their cleavage sites. The method is based on Hidden Markov Models (HMMs) and possesses a modular architecture suitable for both Sec and Tat signal peptides. On an independent test set of experimentally verified Tat signal peptides, PRED-TAT clearly outperforms the previously proposed methods TatP and TATFIND, whereas, when evaluated as a Sec signal peptide predictor compares favorably to top-scoring predictors such as SignalP and Phobius.










Fig. 1.   The sub-model corresponding to the Tat signal peptides. States that share the same emission probabilities are depicted with the same shading and symbol (The letter denotes the dominant aminoacid but only the R states within the Tat motif are invariant). The cleavage site is presented with a dashed vertical line between states A and 1. Allowed transitions are depicted with arrows.


Tables

Table 1. Results obtained from the Tat predictors in the training set. The MCC is computed by comparing Tat signal peptides vs. non-Tat sequences (Sec signal peptides, cytoplasmic and TM sequences)

Method

Tat SPs

Sec SPs

Cyto

TMs

MCC

PRED-TAT

148/150 (98.67%)

319/328 (97.26%)

288/288 (100.00%)

140/140 (100.00%)

0.96

PRED-TAT HMMER

148/150 (98.67%)

312/328 (95.12%)

288/288 (100.00%)

139/140 (99.3%)

0.93

TATFIND

134/150 (89.33%)

326/328 (99.39%)

287/288 (99.65%)

140/140 (100.00%)

0.92

TatP

130/150 (86.67%)

284/328 (86.59%)

283/288 (98.26%)

133/140 (95.00%)

0.73

PF10518

15/150 (10.00%)

328/328 (100.00%)

288/288 (100.00%)

140/140 (100.00%)

0.29

TIGR01409

105/150 (70.00%)

327/328 (99.70%)

288/288 (100.00%)

140/140 (100.00%)

0.81

* The results concerning PRED-TAT were obtained from the 30-fold cross-validation procedure



Table 2.  Results obtained from the Tat predictors in the independent test set. The MCC is computed by comparing Tat signal peptides vs. non-Tat sequences (Sec signal peptides, cytoplasmic and TM sequences)

Method

Tat SPs

Sec SPs

Cyto

TMs

MCC

PRED-TAT

71/75 (94.67%)

265/273 (97.07%)

598/601 (99.50%)

190/192 (98.96%)

0.89

PRED-TAT HMMER

72/75 (96.00%)

259/273 (94.87%)

601/601 (100.00%)

190/192 (98.96%)

0.88

TATFIND

60/75 (80.00%)

270/273 (98.90%)

599/601 (99.67%)

192/192 (100.00%)

0.85

TatP

62/75 (82.67%)

231/273 (84.62%)

594/601 (98.84%)

177/192 (92.19%)

0.61

PF10518

9/75 (12.00%)

273/273 (100.00%)

601/601 (100.00%)

192/192 (100.00%)

0.34

TIGR01409

47/75 (62.67%)

272/273 (99.63%)

601/601 (100.00%)

192/192 (100.00%)

0.77

* The test set contains no similar sequences to those included in the training set


Table 3.  Results obtained from the Sec predictors in the training set. The MCC is computed by comparing Sec signal peptides vs. non-Sec sequences (cytoplasmic and TM sequences)

Method

Sec SPs

Cyto

TMs

MCC

PRED-TAT

315/328 (96.04%)

265/288 (92.01%)

130/140 (92.86%)

0.88

PRED-TAT HMMER

285/328 (86.89%)

285/288 (98.96%)

130/140 (92.86%)

0.88

RPSP

303/328 (92.38%)

287/288 (99.65%)

116/140 (82.86%)

0.87

PrediSi

317/328 (96.65%)

280/288 (97.22%)

108/140 (77.14%)

0.87

SignalPv3 (NN)

323/328 (98.48%)

280/288 (97.22%)

117/140 (83.57%)

0.91

SignalPv3 (HMM)

325/328 (99.09%)

283/288 (98.26%)

114/140 (81.43%)

0.91

Phobius

318/328 (96.95%)

281/288 (97.57%)

129/140 (92.14%)

0.93

Philius

318/328 (96.95%)

274/288 (95.14%)

132/140 (94.29%)

0.91

* The results concerning PRED-TAT were obtained from the 30-fold cross-validation procedure. The training set includes a significant portion of sequences used to train the remaining predictors. The set does not include Tat signal peptides and in such case, the specificity of the Sec signal peptide predictors would be lower.


Table 4.  Results obtained from the Sec predictors in the independent test set. The MCC is computed by comparing Sec signal peptides vs. non-Sec sequences (cytoplasmic and TM sequences)

Method

Sec SPs

Cyto

TMs

MCC

PRED-TAT

252/273 (92.31%)

570/601 (94.84%)

167/192 (86.98%)

0.82

PRED-TAT HMMER

238/273 (87.18%)

597/601 (99.33%)

174/192 (90.62%)

0.86

RPSP

249/273 (91.21%)

601/601 (100.00%)

146/192 (76.04%)

0.83

PrediSi

260/273 (95.24%)

579/601 (96.34%)

114/192 (59.38%)

0.76

SignalPv3 (NN)

252/273 (92.31%)

599/601 (99.67%)

150/192 (78.12%)

0.85

SignalPv3 (HMM)

264/273 (96.70%)

593/601 (98.67%)

134/192 (69.79%)

0.83

Phobius

249/273 (91.21%)

594/601 (98.84%)

154/192 (80.21%)

0.84

Philius

253/273 (92.67%)

582/601 (96.84%)

181/192 (94.27%)

0.88

The test set contains no similar sequences to those included in the training set. The set does not include Tat signal peptides and in such case, the specificity of the Sec signal peptide predictors would be lower.



Supplementary tables

Supplementary Tables with additional results are available here.

Datasets

The datasets on which the method is trained and tested are available here.

Genome analysis

The results from the analysis of bacterial genomes are available here.

Profile HMMs

Instead of using the custom HMM, you can also download the profile Hidden Markov Models (pHMMs) generated by HMMER:

Reference

Pantelis G. Bagos, Elisanthi P. Nikolaou, Theodore D. Liakopoulos and Konstantinos D. Tsirigos.
Combined prediction of Tat and Sec signal peptides with Hidden Markov Models.
2010, Bioinformatics [PDF] [Pubmed] [Google Scholar]