Little is known about the abundance or function of small proteins (SPs) in any organism. These proteins, defined as those containing 50 or fewer amino acids, are difficult to isolate and identify using standard biochemical techniques, and thus are commonly missed in protein identification studies. In addition, it is difficult to identify the short genes (sORFs) that encode SPs using bioinformatics. This has led genome annotators to arbitrarily decide to ignore potential genes that encode proteins of less than 50 amino acids. As a result, the number of SPs made my most organisms has likely been significantly underestimated. Our own research and that of others has shown that bacteria synthesize many more SPs than previously thought, and the total size of the small protein proteome in E. coli or any organism is still unknown.
There are currently two main projects being pursued in the lab:
1) Characterizing the role of small transmembrane proteins in cytochrome bd oxidase activity
2) Discovering new sORFs encoded in the E. coli genome
Characterizing cytochrome bd oxidase small proteins
Two of the most highly conserved small proteins identified in E. coli are YbgT and YccB. Similar proteins are found in more than 50 species of bacteria. The widespread conservation of YbgT and YccB among bacteria species strongly suggests that the small proteins have important functions in the cell. Both YbgT and YccB have similar amino acid sequences, and both are encoded within cytochrome bd oxidase operons. In bacteria, genes encoding proteins with related functions are often localized together in a single operon. The location of ybgT and yccB in cytochrome bd oxidase operons strongly suggests that the small proteins are involved in cytochrome bd oxidase activity.
Cytochrome oxidase is a large membrane protein complex that functions in the electron transport chain and is found in eubacteria, archaea and eukaryotes. A subset of cytochrome oxidases, the cytochrome bd oxidases, are only found in eubacteria but play important roles in cell survival under low oxygen conditions and in bacterial pathogenesis. YbgT is encoded in the cydAB-ybgT-ybgE operon, whereas YccB is encoded in the appC-appB-yccB-appA operon (Fig. 2). We have experimentally confirmed that both SPs are expressed, and their expression patterns are consistent with the sORFs being located in the same operon as the other cytochrome bd oxidase genes. Together, we think that these data strongly suggest that YbgT and YccB are involved in cytochrome bd oxidase function in E. coli and other bacteria species.
Identifying new small proteins in E. coli
There is substantial evidence that the E. coli genome contains more SPs that remain to be identified. In our original study, we identified 2150 potential sORFs in E. coli, but tested only a fraction of these to determine if they truly encoded SPs. For many of the remaining sORFs, we have new bioinformatic evidence suggesting that they also encode SPs. The systematic identification and verification of these proteins is therefore a necessary step toward understanding both the prevalence of SPs in E. coli, as well as characterizing the biological roles of individual proteins. To put it simply, we cannot study them if we don’t know that they are there.