Sequence diversity and functional conformity

Published: Oct. 16, 2015, 11 a.m.

At least four phylogenetically distinct groups of bacteria encode repeat proteins with the common ability to bind specific DNA sequences with a unique but conserved code. Each repeat binds a single DNA base, and specificity is determined by the amino acid residue at position 13 of each repeat. Repeats are typically 33-35 amino acids long. Comparing repeat sequences across all groups reveals that only three positions are hyper-conserved. Repeats are in most cases functionally compatible such that they can be assembled together into a single chimeric array. This functional conformity and inter-compatibility is a result of structural conservation. Repeat arrays of these proteins have been demonstrated or predicted to form almost identical tertiary structures: a right-handed super helix that wraps around the DNA double strand with the base specifying residue of each repeat positioned in the major groove next to its cognate target base. The mechanism of DNA binding is conserved. The first discovered group, providing the name for the rest, are the Transcription Activator Like Effectors (TALEs) of plant-pathogenic Xanthomonas bacteria. The eukaryotic transactivation domain, which lends this group their name, allows them to activate specifically targeted host genes for the benefit of the bacterial invader. The other groups, discovered after the TALEs, are the RipTALs of Ralstonia solanacearum, the Bats of Burkholderia rhizoxinica, and MOrTL1 and MOrTL2 of unknown marine bacteria. Together they are designated TALE-likes. Each designation contains some allusion to the TALEs. The term RipTAL stands for Ralstonia injected proteins TALE-like, the Bats are Burkholderia TALE likes, and the MOrTLs Marine Organism TALE-likes. This unity of terminology belies disunity in the lifestyles of these different bacteria, and the biological roles fulfilled by these proteins. The TALEs have already been researched extensively. The code that describes the relationship between the base specifying residues and their cognate bases is often referred to as the TALE code. This code was deciphered by two groups independently and published in 2009, a year before I began my doctoral work. Since then research into TALEs has not slowed and a great deal has been learnt both about the native biology and biotechnological uses of TALEs. My work has been focused on the other TALE-like groups, none of which had been previously characterized in terms of DNA recognition properties, before I began my work. RipTALs are effector proteins delivered during bacterial wilt disease caused by R. solanacearum strains. This devastating disease affects numerous crop species worldwide. Characterizing the molecular properties of the RipTALs provides a first step towards uncovering their role in the disease. The Bats and MOrTLs are primarily of interest as comparison groups to the TALEs and RipTALs and as sources of sequence diversity for future efforts into TALE repeat engineering. In the introduction of this dissertation, which explores TALE biology, a particular focus will be placed on the DNA binding properties of TALEs and how this can be put to use in TALE technology. After this the RipTALs, Bats and MOrTLs are each introduced, explaining what is known about their provenance and sequence features. The aims of my doctoral work are then listed and expounded in turn. The proximal goal of my doctoral work was to carry out a comparative molecular characterization of each group of non-TALE TALE-likes. In doing so we hoped to gain insights into the principles of TALE-like DNA-binding properties, evolutionary history of the different groups and their potential uses in biotechnology. In the case of the RipTALs this work should begin to unravel the role these proteins play in bacterial wilt disease, as a means to fight this devastating pathogen. The articles I have worked on covering the molecular characterizations of RipTALs, Bats and MOrTLs are then presented in turn. Working together with others I was able to show that repeats from each group of TALE-likes mediate sequence specific DNA binding, revealing a conserved code in each case. This code links position 13 of any TALE-like repeat to a specific DNA base preference in a reliable fashion. I will argue that the TALE-likes represent a fascinating case of conserved structure and function in a diverse sequence space. In addition the TALEs and RipTALs may simply represent one face of the TALE-likes, a protein family mediating as yet unknown biological roles as bacterial DNA binding proteins.