Origins of scaling in genetic code
The principle of least effort in communications has been shown, by Ferrer i Cancho and Sol´e, to explain emergence of power laws (e.g., Zipf’s law) in human languages. This paper applies the principle and the informationtheoretic model of Ferrer i Cancho and Sol´e to genetic coding. The application of the principle is achieved via equating the ambiguity of signals used by “speakers” with codon usage, on the one hand, and the effort of “hearers” with needs of amino acid translation mechanics, on the other hand. The re-interpreted model captures the case of the typical (vertical) gene transfer, and confirms that Zipf’s law can be found in the transition between referentially useless systems (i.e., ambiguous genetic coding) and indexical reference systems (i.e., zero-redundancy genetic coding). As with linguistic symbols, arranging genetic codes according to Zipf’s law is observed to be the optimal solution for maximising the referential power under the effort constraints. Thus, the model identifies the origins of scaling in genetic coding — via a trade-off between codon usage and needs of amino acid translation. Furthermore, the paper extends Ferrer i Cancho – Sol´e model to multiple inputs, reaching out toward the case of horizontal gene transfer (HGT) where multiple contributors may share the same genetic coding. Importantly, the extended model also leads to a sharp transition between referentially useless systems (ambiguous HGT) and indexical reference systems (zero-redundancy HGT). Zipf’s law is also observed to be the optimal solution in the HGT case.