Unnatural Base Design and Characterization

The structure of duplex DNA is based on the complementary Watson-Crick hydrogen bonding patterns of adenine with thymine (dA:dT base pair) and guanine with cytosine (dG:dC base pair). Unnatural base pairs that adopt hydrogen bonding interactions have shown thermodynamic orthogonality (e.g. iso-C:iso-G, κ:X), but each of these unnatural bases has a stable tautomeric form whose hydrogen bond donor and acceptor pattern is compatible with one of the natural nucleotides, and thus could lead to mispairing. Since hydrophobic interactions are very strong forces in aqueous biological systems, the unnatural pairs based on hydrophobicity could provide long term storage of genetic information and a unique opportunity to study molecular recognition in the context of the DNA replication. In addition, the development of an unnatural base pair will offer insights into the complex mechanism of DNA replication. Finally, a third base would expand the genetic alphabet for a wide variety of in vitro biotechnology applications, and would also lay the foundation for the in vivo expansion of an organism’s genetic code.

In an effort to develop an orthogonal third base pair, over 100 hydrophobic bases have been designed, synthesized as the triphosphate and phosphoramidite, and characterized. A number of ‘first generation’ self-pairs and hetero-pairs have been identified with promising properties (see image below and slideshow, above).


Remarkably, we have found that neither H-bonding nor large aromatic surface area is required for base pair stability in duplex DNA or polymerase mediated replication. Thus, while our early efforts focused mostly on nucleobase analogs with large aromatic surface area, later efforts have focused on pairs formed between suitably substituted benzene rings. The 3-fluorobenzene (3FB) self-pair shown above is one the most efficiently and selectively replicated of these smaller base pairs. The 3FB triphosphate is inserted opposite 3FB in the template, and the nascent primer terminus is extended with rates and selectivities that begin to rival those of natural synthesis. We solved both the NMR and the X-ray structure of duplex DNA containing the 3FB self pair in collaboration with B. Geierstanger and G. Spraggon (GNF), respectively. The structure of the self-pair contained hints as to why it is so efficiently replicated by DNA polymerases, and also offered hints about how to further optimize the self-pair. For example, we found that the addition of judiciously placed nitrogen atoms or methyl and cyano groups have a profound effect on base pair stability and replication.

3rd generation pairs

In a different approach to identify base pairs among the many different unnatural nucleobases that we had synthesized, we conducted two, independent high-throughput screens. From 3,600 possible candidate pairs, both screens identified one unnatural heteropair, that formed between dSICS and dMMO2 (Leconte, 2008), which is synthesized and extended by DNA polymerases of different families with remarkable efficiency. With further optimization, we identified the d5SICS-dMMO2 and d5SICS-dNaM heteropairs (see image above). Characterization of the d5SICS-dNaM base pair showed that the efficiency of both steps of unnatural base pair synthesis are within 4- to 13-fold of that of a natural base pair, and the overall fidelity when both unnatural base pair synthesis and extension are combined is at least 104. The synthesis of d5SICS-dNaM by either Kf or Taq polymerase is efficient and selective. Taq inserts both dNaMTP opposite d5SICS and d5SICSTP opposite dNaM only 10-fold less efficiently than a natural base pair in the same sequence context. Moreover, none of the natural dNTPs are inserted efficiently opposite either d5SICS or dNaM in the template, resulting in 150-fold or greater fidelities for this step alone. d5SICS-dNaM is also efficiently transcribed into RNA in both orientations. Lastly, DNA containing the d5SICS-dNaM unnatural base pair may be PCR amplified billions times with error rates in the range of 10-3 to 10-5.


In addition, we have generated oligonucleotide libraries wherein the d5SICS-dNaM base pair is surrounded by randomized natural nucleotides. These libraries were amplified by PCR up to 1024 times, after which we analyzed the diversity of the population via deep sequencing (Illumina GAIIx). Detailed analysis revealed almost no sequence bias even after this massive amplification. Taken together with the high efficiency and fidelity of the unnatural pair, which approach those of fully natural DNA, this data demonstrates that d5SICS-dNaM is a fully functional base pair. Thus, dA-dT, dC-dG and d5SICS-dNaM base pairs represent first truly expanded genetic alphabet.


We are currently exploring a variety of applications of the expanded genetic alphabet – all of which are made possible by the fact that both the 5SICS and MMO2 scaffolds can be "decorated" with linkers bearing useful functionalities, allowing efficient and site-specific labeling of both DNA and RNA (Seo, 2011).

Structural studies are providing insight into the origins of the efficient replication of the d5SICS-dNaM base pair. This work has been done in collaboration with Prof. Tammy Dwyer (Univ. of San Diego) and Prof. Andreas Marx (University of Konstanz). We found that the d5SICS-dNaM base pair adopts a partially intercalated structure in duplex DNA. This is not surprising since both the d5SICS and dNaM nucleobases are large, hydrophobic, and lack the ability to form hydrogen bonds. The structure of the unnatural base pair looks more like a dA-dA mispair and is not consistent with the accepted mechanism of DNA replication: DNA polymerase evolved over eons of years to recognize a very specific planar Watson-Crick geometry and to reject others. Such geometrical selection is one of the most important contributors to high fidelity replication. This apparent conundrum was solved by determining the structure of d5SICS-dNaM base pair in the active site of DNA polymerase. In the polymerase, the d5SICS-dNaM base pair adopts a planar geometry that is virtually superimposable with that of a natural dG-dC pair.


The structural data also made it clear that the formation of the d5SICS-dNaM base pair induced essentially the same conformational changes in the polymerase as that of a natural base pair. From this data, we conclude that the DNA polymerase is not only capable of selecting for the correct geometry, it is also capable of enforcing the correct geometry – at least in the case of unnatural base pairs that have sufficient plasticity to adapt to the changes.

These results further demonstrate that the determinants of a functional unnatural base pair may be designed into predominantly hydrophobic nucleobases that have little to no structural similarity to the natural purines or pyrimidines. Importantly, the results reveal that the unnatural base pairs may function within an expanded genetic alphabet and make possible many in vitro applications.

In addition to our work to design unnatural base pairs, we are actively pursuing the directed evolution of polymerases that are tailored to better recognize them. Click here to read more.

Work on this project is supported by NIH and also benefits from collaborations with the TSRI Center for Protein and Nucleic Acid Research and its Director, Dr. Phillip Ordoukhanian; Prof. Ali Torkamani (Scripps Translational Science Institute); Prof. Tammy Dwyer (University of San Diego); and Prof. Andreax Marx. (University of Konstanz)

Return to overview of unnatural DNA research projects