5. Lipoprotein signals


The purpose of this practical is to predict the presence of lipoprotein signal peptide in bacterial proteins using regular expressions and Hidden Markov Models.
    Signal peptides in Bacteria are mainly divided to the secretory signal peptides  that are cleaved by Signal Peptidase I (SPase I) and to those cleaved by Signal Peptidase II (SPase II or Lsp), which characterize the membrane-bound lipoproteins. The secretory signal peptides have been extensively studied for years, revealing a structure comprised of a short, positively charged N-region, a hydrophobic H-region that spans the membrane, a C-region of mostly small and uncharged residues and a cleavage site (known as the A-X-A motif, in which A stands for Alanine and X for any amino acid), that is recognized by the peptidase that cleaves the peptide and releases the mature protein. The signal peptide of bacterial lipoproteins possesses a similar structure,8 with main differences being the comparatively shorter length and the unique pattern in the C-region  (which is commonly denoted by [LVI]-[AST]-[GA]-C and termed as “lipobox”) that is recognized for cleavage by SPase II. The Cysteine in the last position of the particular pattern is indispensable in both Gram-positive and Gram-negative bacteria, and is necessary for membrane anchoring. The post-translational lipid modification involves three enzymes that act sequentially: the prolipoprotein diacylglyceryl transferase (Lgt), that transfers a diacylglyceride to the cysteine sulfydryl group, the signal peptidase II (SPase II or Lsp) that cleaves the signal peptide at the residue before the cysteine forming an apolipoprotein and the apolipoprotein N-acyltransferase (Lnt) which acylates the apolipoprotein N-terminal cysteine forming the mature lipoprotein. The proteins carrying a secretory signal peptide, can be directed to the membrane through the action of the Sec translocase,although another major pathway has been discovered, utilizing the Twin-Arginine (TAT) translocase which recognizes (longer in general) signal peptides that are carrying a distinctive pattern of two consecutive Arginines (R-R) in the N-region. Translocation of lipoproteins through the TAT pathway has been postulated based on sequence analysis,16 but only recently has been proven for Bacteria (Desulfovibrio vulgaris) and Archaea (Haloferax volcanii).
    The discovery of globomycin, a specific inhibitor of SPase II, represented a major breakthrough in the biochemical studies of lipoprotein maturation. Bacteria treated with globomycin, as well as SPase II deficient strains, show accumulation of lipid-modified prolipoproteins. Nevertheless, extensive studies in SPase II deficient strains showed that absence of SPase II results in rather pleiotropic effects on the composition of the extracellular proteome, since some prolipoproteins were released in the medium whereas the synthesis of others was strongly reduced. Conversely, only in the case of Lgt deficient strains, significantly more lipoproteins are observed in the growth medium. The most excellent, however, proof that a protein is a lipoprotein would be labelling with [3H] or [14C] palmitate in the presence/absence of globomycin (or in wild-type and SPase II or Lgt deficient strains), combined with immunoblotting, immunoprecipitation, protein fractionation and protease accessibility assays to investigate its extracellular localization.


You will have to download the 63 sequences from Gram-negative Bacteria used in the LipoP method. The file is in SwissProt format and the precise location of the cleavage site is given in the FT SIGNAL field.

KW   Cellulose degradation; Hydrolase; Glycosidase; Zymogen; Membrane;
KW   Lipoprotein; Signal; Palmitate.
FT   SIGNAL        1     19
FT   PROPEP       20     45
FT   CHAIN        46    426       Endoglucanase.
FT   LIPID        20     20       N-palmitoyl cysteine.

In the first place you will have to use the scripts that you wrote in the previous practicals and convert the sequences to Fasta format. For making the subsequently calculations easier you could choose a single-line fasta format. Afterwards, you should write a simple program to test the existence of the lipo-box. For details of the regular expression patterns that could be used, see the relevant references below. For instance, for the von Heijne pattern could be coded:

    print "lipoprotein\n";

You can check which of the proposed patterns performs better as well as to confirm the results against the annotations in SwissProt given above. Afterwards, you should remove the mature part of the protein and keep only the sequence of the signal peptide (including the Cysteine) and write a program for aligning the sequences to the right. For instance if there were two sequences with lengths of signal peptide equal to 25 and 30 respectively, the former should have five gaps (-) preceeding the initial Methionine. This special form of a multiple alignment should be used for performing analyses of the aminoacid frequencies in the lipobox and an easy way to perform such an analysis is using the WebLogo sever (http://weblogo.berkeley.edu/) In the last step of the analysis, you should create a profile HMM from the multiple alignment using the HMMER software (http://hmmer.janelia.org/). The profile HMM should be used to search for signal peptides in the 63 lipoproteins and the results should be compared against the regular expression patterns and the HMMs and NNs reported in the LipoP paper. For an additional negative set (of proteins with no lipoprotein signal peptides) you can use the transmembrane proteins that you already used in the previous practicals.


  • Juncker AS, Willenbrock H, Heijne GV, et al. Prediction of lipoprotein signal peptides in Gram-negative bacteria. 2003;12:1652-1662. [PDF]
  • Sutcliffe IC, Harrington DJ. Pattern searches for the identification of putative lipoprotein genes in Gram-positive bacterial genomes. Microbiology (Reading, England). 2002;148(Pt 7):2065-77. [PDF]
  • Bagos PG, Tsirigos KD, Liakopoulos TD, Hamodrakas SJ. Prediction of lipoprotein signal peptides in Gram-positive bacteria with Hidden a Markov Model, 2008, J Proteome Research, 7(12):5082-93 [PDF] [Pubmed] [Google Scholar]