Supplementary Table 1 Results obtained by using PRED-TAT for classifying simultaneously
the protein sequences into three categories (Tat signal peptides, Sec signal
peptides and non signal peptides, i.e. transmembrane and cytoplasmic proteins)
in the form of a 3x3 confusion matrix. A: Results obtained by 30-fold
cross-validation on the training set. B: Results obtained on the independent
test set and C: Results obtained on the training set of TatP. PRED-TAT uses the
Viterbi algorithm and thus no fine-tuning is required.
A
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
148
(98.67%)
|
0
(0.00%)
|
2
(1.33%)
|
150
(100%)
|
Non-signal
|
0
(0.00%)
|
395
(92.29%)
|
33
(7.71%)
|
428
(100%)
|
Signal
Peptide
|
9
(2.74%)
|
4
(1.22%)
|
315
(96.04%)
|
328
(100%)
|
B
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
71
(94.67%)
|
1
(1.33%)
|
3
(4.00%)
|
75
(100%)
|
Non-signal
|
5
(0.65%)
|
732
(92.31%)
|
56
(7.04%)
|
793
(100%)
|
Signal
Peptide
|
8
(2.93%)
|
13
(4.76%)
|
252
(92.31%)
|
273
(100%)
|
C
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
105
(100%)
|
0
(0.00%)
|
0
(0.00%)
|
105
(100%)
|
Non-signal
|
6
(1.30%)
|
435
(93.95%)
|
22
(4.75%)
|
463
(100%)
|
Signal
Peptide
|
6
(4.62%)
|
0
(0.00%)
|
124
(95.38%)
|
130
(100%)
|
Supplementary Table 2 Results obtained by using PRED-TATHMMER for classifying
simultaneously the protein sequences into three categories (Tat signal
peptides, Sec signal peptides and non signal peptides, i.e. transmembrane and
cytoplasmic proteins) in the form of a 3x3 confusion matrix. A: Results
obtained by 30-fold cross-validation on the training set. B: Results obtained
on the independent test set and C: Results obtained on the training set of
TatP. Since we have two independent profile HMMs, the final decision here is
obtained by choosing the model with the highest score (i.e. if the two scores
of a protein are larger than zero, the highest-scoring model is chosen)
A
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
148
(98.66%)
|
2
(1.33%)
|
0
(0.00%)
|
150
(100%)
|
Non-signal
|
1
(0.24%)
|
414
(96.73%)
|
13
(3.03%)
|
428
(100%)
|
Signal
Peptide
|
3
(0.92%)
|
41
(12.5%)
|
284
(86.59%)
|
328
(100%)
|
B
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
70
(93.33%)
|
2
(2.67%)
|
3
(4.00%)
|
75
(100%)
|
Non-signal
|
1
(0.13%)
|
771
(97.23%)
|
21
(2.64%)
|
793
(100%)
|
Signal
Peptide
|
4
(1.46%)
|
33
(12.09%)
|
236
(86.45%)
|
273
(100%)
|
C
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
105
(100%)
|
0
(0.00%)
|
0
(0.00%)
|
105
(100%)
|
Non-signal
|
1
(0.22%)
|
460
(99.35%)
|
2
(0.43%)
|
463
(100%)
|
Signal
Peptide
|
2
(1.54%)
|
6
(4.62%)
|
122
(93.84%)
|
130
(100%)
|
Supplementary Table 3 Results obtained by using PRED-TATHMMER for classifying
simultaneously the protein sequences into three categories (Tat signal
peptides, Sec signal peptides and non signal peptides, i.e. transmembrane and
cytoplasmic proteins) in the form of a 3x3 confusion matrix. A: Results
obtained by 30-fold cross-validation on the training set. B: Results obtained
on the independent test set and C: Results obtained on the training set of
TatP. Since we have two independent profile HMMs, the final decision here is
obtained by giving priority to the HMM for Tat-substrates (i.e. if the score of
a protein is larger than zero irrespectively of the other HMM, the protein is
classified as Tat).
A
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
150
(100.00%)
|
0
(0.00%)
|
0
(0.00%)
|
150
(100%)
|
Non-signal
|
1
(0.24%)
|
414
(96.73%)
|
13
(3.03%)
|
428
(100%)
|
Signal
Peptide
|
14
(4.27%)
|
41
(12.5%)
|
271
(82.62%)
|
328
(100%)
|
B
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
72
(96.00%)
|
2
(2.67%)
|
1
(1.33%)
|
75
(100%)
|
Non-signal
|
1
(0.13%)
|
771
(97.23%)
|
21
(2.64%)
|
793
(100%)
|
Signal
Peptide
|
14
(5.13%)
|
33
(12.09%)
|
226
(82.78%)
|
273
(100%)
|
C
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
105
(100%)
|
0
(0.00%)
|
0
(0.00%)
|
105
(100%)
|
Non-signal
|
1
(0.22%)
|
460
(99.35%)
|
2
(0.43%)
|
463
(100%)
|
Signal
Peptide
|
8
(6.15%)
|
6
(4.62%)
|
116
(89.23%)
|
130
(100%)
|
Supplementary Table 4 Results obtained by using TatP and SignalP3-NN for classifying simultaneously
the protein sequences into three categories (Tat signal peptides, Sec signal
peptides and non signal peptides, i.e. transmembrane and cytoplasmic proteins)
in the form of a 3x3 confusion matrix. A: Results obtained on the training set.
B: Results obtained on the independent test set and C: Results obtained on the
training set of TatP (not cross-validated). Since we have two independent
predictors, the final decision here is obtained by giving priority to TatP
(i.e. if the score of a protein is larger than the cutoff irrespectively of
SignalP’s output, the protein is classified as Tat).
A
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
130
(86.67%)
|
14
(9.33%)
|
6
(4.00%)
|
150
(100%)
|
Non-signal
|
12
(2.80%)
|
389
(90.88%)
|
27
(6.32%)
|
428
(100%)
|
Signal
Peptide
|
44
(13.41%)
|
5
(1.53%)
|
279
(85.06%)
|
328
(100%)
|
B
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
62
(82.67%)
|
7
(9.33%)
|
6
(8.00%)
|
75
(100%)
|
Non-signal
|
22
(2.77%)
|
732
(92.31%)
|
39
(4.92%)
|
793
(100%)
|
Signal
Peptide
|
42
(15.38%)
|
20
(7.33%)
|
211
(77.29%)
|
273
(100%)
|
C
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
103
(98.10%)
|
1
(0.95%)
|
1
(0.95%)
|
105
(100%)
|
Non-signal
|
5
(1.08%)
|
448
(96.76%)
|
10
(2.16%)
|
463
(100%)
|
Signal
Peptide
|
15
(11.54%)
|
2
(1.54%)
|
113
(86.92%)
|
130
(100%)
|
Supplementary Table 5 Results obtained by using TatP and SignalP3-NN for classifying
simultaneously the protein sequences into three categories (Tat signal
peptides, Sec signal peptides and non signal peptides, i.e. transmembrane and
cytoplasmic proteins) in the form of a 3x3 confusion matrix. A: Results
obtained on the training set. B: Results obtained on the independent test set
and C: Results obtained on the training set of TatP (not cross-validated). Since
we have two independent predictors, the final decision here is obtained by
choosing the predictor with the highest score (i.e. if the two scores of a
protein are larger than the respective cutoffs, the highest-scoring predictor
is chosen)
A
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
54
(36.00%)
|
14
(9.33%)
|
82
(54.67%)
|
150
(100%)
|
Non-signal
|
9
(2.12%)
|
389
(90.88%)
|
30
(7.00%)
|
428
(100%)
|
Signal
Peptide
|
0
(0.00%)
|
5
(1.52%)
|
323
(98.48%)
|
328
(100%)
|
B
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
27
(36.00%)
|
7
(9.33%)
|
41
(54.67%)
|
75
(100%)
|
Non-signal
|
18
(2.27%)
|
732
(92.31%)
|
43
(5.42%)
|
793
(100%)
|
Signal
Peptide
|
2
(0.73%)
|
20
(7.33%)
|
251
(91.94%)
|
273
(100%)
|
C
|
|
Predicted
|
Observed
|
|
Tat
|
Non-signal
|
Signal
Peptide
|
Total
|
Tat
|
52
(49.52%)
|
1
(0.96%)
|
52
(49.52%)
|
105
(100%)
|
Non-signal
|
4
(0.86%)
|
448
(96.76%)
|
11
(2.38%)
|
463
(100%)
|
Signal
Peptide
|
0
(0.00%)
|
2
(1.54%)
|
128
(98.46%)
|
130
(100%)
|
Supplementary Table 5 Detailed results for the 44 proteins that contain the RR motif but
are experimentally verified not to be Tat-substrates.
Uniprot AC
|
PRED-TAT
|
TatP
|
TatFind
|
TIGR01409
|
PF10518
|
PRED-TATHMMER
|
Q9RL54
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q9RJG0
|
N
|
N
|
N
|
N
|
N
|
N
|
Q9RK12
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q9RCY0
|
N
|
N
|
N
|
N
|
N
|
N
|
Q9FCD7
|
N
|
Y
|
N
|
N
|
N
|
Y
|
Q9KZN6
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q9RJ44
|
Y
|
Y
|
N
|
N
|
N
|
N
|
Q9Z517
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q93JJ3
|
N
|
N
|
N
|
N
|
N
|
N
|
Q9KZ11
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q9RDQ1
|
N
|
N
|
N
|
N
|
N
|
N
|
Q9L0A0
|
N
|
N
|
N
|
N
|
N
|
N
|
Q9L1I5
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q9L1Z8
|
Y
|
Y
|
N
|
N
|
N
|
N
|
Q9L068
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q93J76
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q9RKH6
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q9ADP5
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q9KZV9
|
Y
|
N
|
N
|
N
|
N
|
Y
|
Q9KZU9
|
N
|
Y
|
N
|
N
|
N
|
Y
|
Q9AK41
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q9ADD7
|
N
|
Y
|
N
|
N
|
N
|
N
|
Q93IU2
|
N
|
N
|
N
|
N
|
N
|
N
|
Q9L1F8
|
Y
|
Y
|
N
|
N
|
N
|
Y
|
Q9L1E4
|
Y
|
Y
|
N
|
N
|
N
|
Y
|
O50503
|
N
|
N
|
N
|
N
|
N
|
N
|
O86690
|
N
|
N
|
N
|
N
|
N
|
N
|
Q9X7P4
|
N
|
Y
|
N
|
Y
|
N
|
Y
|
Q9L178
|
N
|
N
|
N
|
Y
|
N
|
Y
|
Q9AK42
|
N
|
Y
|
N
|
N
|
N
|
N
|
P04957
|
N
|
N
|
N
|
N
|
N
|
N
|
P24141
|
N
|
N
|
N
|
N
|
N
|
N
|
O31773
|
N
|
N
|
N
|
N
|
N
|
N
|
O05497
|
N
|
N
|
N
|
N
|
N
|
N
|
O34313
|
N
|
N
|
N
|
N
|
N
|
N
|
P54602
|
N
|
N
|
N
|
N
|
N
|
N
|
O34654
|
N
|
N
|
N
|
N
|
N
|
N
|
O32108
|
N
|
N
|
N
|
N
|
N
|
N
|
P94522
|
N
|
Y
|
N
|
N
|
N
|
N
|
P42061
|
N
|
N
|
N
|
N
|
N
|
N
|
P10475
|
N
|
Y
|
N
|
N
|
N
|
N
|
P39848
|
N
|
N
|
N
|
N
|
N
|
N
|
P96499
|
N
|
N
|
N
|
N
|
N
|
N
|
Q07833
|
N
|
N
|
N
|
N
|
N
|
N
|
TOTAL
|
39/44
(86.64%)
|
22/44
(50%)
|
44/44
(100%)
|
42/44
(95.45%)
|
44/44
(100%)
|
37/44
(84.1%)
|