Codon Bias and Protein Interaction Networks 

The genetic information carried by the mRNA and then translated into proteins is encoded into nucleotide triplets called codons. Since the mRNA is composed by four nucleotidic bases (A, U, C, G), there are 64 possible codons that have to represent only 20 naturally occurring amino acids: the genetic code is therefore redundant. The different codons coding for the same amino acid (called synonymous codons) are used with different frequencies, a phenomenon known as codon usage bias (CUB). While the biological meaning and origin of CUB is not fully understood yet, the degeneracy of the genetic code might provide an additional degree of freedom to modulate accuracy and efficiency of translation. Indeed, highly expressed genes feature an extreme bias by using a small subset of codons, optimized by translational selection, whereas the persistence of non-optimal codons in less-expressed sequences causes long breaks during protein synthesis (possibly as a result of genetic drift) and can have a key role in the protein folding process. 

Different indices have been proposed to measure CUB. The Codon Adaptation Index (CAI) is based on the relative usage of synonymous codons in the genome; the tRNA Adaptation Index (tAI) builds on the adaptation of codon usage to tRNA availability; the Effective Number of Codons (Nc) measures the entropy of the codon usage distribution. In this work we proposed a novel codon bias index named Competition Adaptation Index (CompAI), based on tRNA availability and competition between cognate and near-cognate tRNAs. Differently from CAI and tAI, CompAI is a self-consistent, parameter-free index that does not require a set of reference genes for its calibration. We performed a genome-wide analysis on  Escherichia Coli (E.coli), revealing that the information on gene conservation across species and gene essentiality is correlated with the codon bias of their sequences, especially if measured by CompAI. In this work we extended these observations to a set of unrelated bacterial species. Our analysis revealed that those genes which are more conserved among bacterial species are also prone to be essential; moreover, the codon usage in these conserved genes is, in general, more optimized than in less conserved genes. 

Relative synonimous codon usage values in different species. Both genomes and groups of codons are clustered by similarity of codon usage.

To operate biological activities in living cells, proteins work in association with other proteins, giving rise to protein-protein interaction (PPI) networks. In this work we also studied codon bias in relation to the connectivity patterns of PPIs in E.coli, showing that translational selection systematically favors proteins with the highest number of interactions and belonging to the most densely connected community of the network. Importantly, the similarity in the CUB of a set of genes increases the likelihood that the corresponding proteins interact (in comparison with an appropriate null model). 

Z-score of the link probability between proteins as a function of the codon bias distance of the respective genes, using the configuration model as null.

In this work we extended this analysis to a large set of unrelated bacterial species, providing basic observations of sufficient generality on the co-evolution of CUB and the connectivity features of bacterial interactomes. These findings point out that CUB should be a relevant parameter in the prediction of unknown PPIs from genomic information. In this work we broadened these observations by studying the relationships among conservation, essentiality and functional annotation at the genetic level and the connectivity of proteins in the PPI network across bacterial species. We revealed the presence of a functional transition in the PPI network, whereby the genes of proteins with high connectivities are under selective pressure, conserved, and essential. Moreover, the connectivity distribution of each bacterial PPI network features a ubiquitous and almost-invariant structure of conserved hubs, essentially due to the ribosomal protein complexes.

Resources