Tables





Supplementary Table 1 Results obtained by using PRED-TAT for classifying simultaneously the protein sequences into three categories (Tat signal peptides, Sec signal peptides and non signal peptides, i.e. transmembrane and cytoplasmic proteins) in the form of a 3x3 confusion matrix. A: Results obtained by 30-fold cross-validation on the training set. B: Results obtained on the independent test set and C: Results obtained on the training set of TatP. PRED-TAT uses the Viterbi algorithm and thus no fine-tuning is required.

 

A

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

148

(98.67%)

0

(0.00%)

2

(1.33%)

150

(100%)

Non-signal

0

(0.00%)

395

(92.29%)

33

(7.71%)

428

(100%)

Signal Peptide

9

(2.74%)

4

(1.22%)

315

(96.04%)

328

(100%)

B

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

71

(94.67%)

1

(1.33%)

3

(4.00%)

75

(100%)

Non-signal

5

(0.65%)

732

(92.31%)

56

(7.04%)

793

(100%)

Signal Peptide

8

(2.93%)

13

(4.76%)

252

(92.31%)

273

(100%)

C

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

105

(100%)

0

(0.00%)

0

(0.00%)

105

(100%)

Non-signal

6

(1.30%)

435

(93.95%)

22

(4.75%)

463

(100%)

Signal Peptide

6

(4.62%)

0

(0.00%)

124

(95.38%)

130

(100%)

 

 

Supplementary Table 2 Results obtained by using PRED-TATHMMER for classifying simultaneously the protein sequences into three categories (Tat signal peptides, Sec signal peptides and non signal peptides, i.e. transmembrane and cytoplasmic proteins) in the form of a 3x3 confusion matrix. A: Results obtained by 30-fold cross-validation on the training set. B: Results obtained on the independent test set and C: Results obtained on the training set of TatP. Since we have two independent profile HMMs, the final decision here is obtained by choosing the model with the highest score (i.e. if the two scores of a protein are larger than zero, the highest-scoring model is chosen)

 

 

A

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

148

(98.66%)

2

(1.33%)

0

(0.00%)

150

(100%)

Non-signal

1

(0.24%)

414

(96.73%)

13

(3.03%)

428

(100%)

Signal Peptide

3

(0.92%)

41

(12.5%)

284

(86.59%)

328

(100%)

B

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

70

(93.33%)

2

(2.67%)

3

(4.00%)

75

(100%)

Non-signal

1

(0.13%)

771

(97.23%)

21

(2.64%)

793

(100%)

Signal Peptide

4

(1.46%)

33

(12.09%)

236

(86.45%)

273

(100%)

C

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

105

(100%)

0

(0.00%)

0

(0.00%)

105

(100%)

Non-signal

1

(0.22%)

460

(99.35%)

2

(0.43%)

463

(100%)

Signal Peptide

2

(1.54%)

6

(4.62%)

122

(93.84%)

130

(100%)

 

 

 

Supplementary Table 3 Results obtained by using PRED-TATHMMER for classifying simultaneously the protein sequences into three categories (Tat signal peptides, Sec signal peptides and non signal peptides, i.e. transmembrane and cytoplasmic proteins) in the form of a 3x3 confusion matrix. A: Results obtained by 30-fold cross-validation on the training set. B: Results obtained on the independent test set and C: Results obtained on the training set of TatP. Since we have two independent profile HMMs, the final decision here is obtained by giving priority to the HMM for Tat-substrates (i.e. if the score of a protein is larger than zero irrespectively of the other HMM, the protein is classified as Tat).

 

 

A

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

150

(100.00%)

0

(0.00%)

0

(0.00%)

150

(100%)

Non-signal

1

(0.24%)

414

(96.73%)

13

(3.03%)

428

(100%)

Signal Peptide

14

(4.27%)

41

(12.5%)

271

(82.62%)

328

(100%)

B

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

72

(96.00%)

2

(2.67%)

1

(1.33%)

75

(100%)

Non-signal

1

(0.13%)

771

(97.23%)

21

(2.64%)

793

(100%)

Signal Peptide

14

(5.13%)

33

(12.09%)

226

(82.78%)

273

(100%)

C

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

105

(100%)

0

(0.00%)

0

(0.00%)

105

(100%)

Non-signal

1

(0.22%)

460

(99.35%)

2

(0.43%)

463

(100%)

Signal Peptide

8

(6.15%)

6

(4.62%)

116

(89.23%)

130

(100%)

 

 

Supplementary Table 4 Results obtained by using TatP and SignalP3-NN for classifying simultaneously the protein sequences into three categories (Tat signal peptides, Sec signal peptides and non signal peptides, i.e. transmembrane and cytoplasmic proteins) in the form of a 3x3 confusion matrix. A: Results obtained on the training set. B: Results obtained on the independent test set and C: Results obtained on the training set of TatP (not cross-validated). Since we have two independent predictors, the final decision here is obtained by giving priority to TatP (i.e. if the score of a protein is larger than the cutoff irrespectively of SignalP’s output, the protein is classified as Tat).

 

 

A

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

130

(86.67%)

14

(9.33%)

6

(4.00%)

150

(100%)

Non-signal

12

(2.80%)

389

(90.88%)

27

(6.32%)

428

(100%)

Signal Peptide

44

(13.41%)

5

(1.53%)

279

(85.06%)

328

(100%)

B

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

62

(82.67%)

7

(9.33%)

6

(8.00%)

75

(100%)

Non-signal

22

(2.77%)

732

(92.31%)

39

(4.92%)

793

(100%)

Signal Peptide

42

(15.38%)

20

(7.33%)

211

(77.29%)

273

(100%)

C

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

103

(98.10%)

1

(0.95%)

1

(0.95%)

105

(100%)

Non-signal

5

(1.08%)

448

(96.76%)

10

(2.16%)

463

(100%)

Signal Peptide

15

(11.54%)

2

(1.54%)

113

(86.92%)

130

(100%)

 

 

Supplementary Table 5 Results obtained by using TatP and SignalP3-NN for classifying simultaneously the protein sequences into three categories (Tat signal peptides, Sec signal peptides and non signal peptides, i.e. transmembrane and cytoplasmic proteins) in the form of a 3x3 confusion matrix. A: Results obtained on the training set. B: Results obtained on the independent test set and C: Results obtained on the training set of TatP (not cross-validated). Since we have two independent predictors, the final decision here is obtained by choosing the predictor with the highest score (i.e. if the two scores of a protein are larger than the respective cutoffs, the highest-scoring predictor is chosen)

 

A

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

54

(36.00%)

14

(9.33%)

82

(54.67%)

150

(100%)

Non-signal

9

(2.12%)

389

(90.88%)

30

(7.00%)

428

(100%)

Signal Peptide

0

(0.00%)

5

(1.52%)

323

(98.48%)

328

(100%)

B

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

27

(36.00%)

7

(9.33%)

41

(54.67%)

75

(100%)

Non-signal

18

(2.27%)

732

(92.31%)

43

(5.42%)

793

(100%)

Signal Peptide

2

(0.73%)

20

(7.33%)

251

(91.94%)

273

(100%)

C

 

Predicted

Observed

 

Tat

Non-signal

Signal Peptide

Total

Tat

52

(49.52%)

1

(0.96%)

52

(49.52%)

105

(100%)

Non-signal

4

(0.86%)

448

(96.76%)

11

(2.38%)

463

(100%)

Signal Peptide

0

(0.00%)

2

(1.54%)

128

(98.46%)

130

(100%)

 

 

 

Supplementary Table 5 Detailed results for the 44 proteins that contain the RR motif but are experimentally verified not to be Tat-substrates.

Uniprot AC

PRED-TAT

TatP

TatFind

TIGR01409

PF10518

PRED-TATHMMER

Q9RL54

N

Y

N

N

N

N

Q9RJG0

N

N

N

N

N

N

Q9RK12

N

Y

N

N

N

N

Q9RCY0

N

N

N

N

N

N

Q9FCD7

N

Y

N

N

N

Y

Q9KZN6

N

Y

N

N

N

N

Q9RJ44

Y

Y

N

N

N

N

Q9Z517

N

Y

N

N

N

N

Q93JJ3

N

N

N

N

N

N

Q9KZ11

N

Y

N

N

N

N

Q9RDQ1

N

N

N

N

N

N

Q9L0A0

N

N

N

N

N

N

Q9L1I5

N

Y

N

N

N

N

Q9L1Z8

Y

Y

N

N

N

N

Q9L068

N

Y

N

N

N

N

Q93J76

N

Y

N

N

N

N

Q9RKH6

N

Y

N

N

N

N

Q9ADP5

N

Y

N

N

N

N

Q9KZV9

Y

N

N

N

N

Y

Q9KZU9

N

Y

N

N

N

Y

Q9AK41

N

Y

N

N

N

N

Q9ADD7

N

Y

N

N

N

N

Q93IU2

N

N

N

N

N

N

Q9L1F8

Y

Y

N

N

N

Y

Q9L1E4

Y

Y

N

N

N

Y

O50503

N

N

N

N

N

N

O86690

N

N

N

N

N

N

Q9X7P4

N

Y

N

Y

N

Y

Q9L178

N

N

N

Y

N

Y

Q9AK42

N

Y

N

N

N

N

P04957

N

N

N

N

N

N

P24141

N

N

N

N

N

N

O31773

N

N

N

N

N

N

O05497

N

N

N

N

N

N

O34313

N

N

N

N

N

N

P54602

N

N

N

N

N

N

O34654

N

N

N

N

N

N

O32108

N

N

N

N

N

N

P94522

N

Y

N

N

N

N

P42061

N

N

N

N

N

N

P10475

N

Y

N

N

N

N

P39848

N

N

N

N

N

N

P96499

N

N

N

N

N

N

Q07833

N

N

N

N

N

N

TOTAL

39/44

(86.64%)

22/44

(50%)

44/44

(100%)

42/44

(95.45%)

44/44

(100%)

37/44

(84.1%)