MSI Product Previous Next Contents Index Top
Biopolymer



3       Nucleic Acid Model Building

The Biopolymer module contains two pulldowns (Nucleic_Acid and Nucleotide) that contain commands used for building and modifying molecular models of nucleic acid compounds.

Nucleic acids carry the genetic information required for the correct functioning and reproduction of living organisms. In eukaryotic organisms the cells contain a nucleus in which the genome DNA is distributed across a number of chromosomes. In prokaryotic organisms the DNA is not contained in a separate organelle such as a nucleus. Retroviral genomic information is contained in RNA.


Background

This section describes some of the definitions and basic chemistry required for an appreciation of nucleic acid structure and model building.

The definition of nucleic acids

Nucleic acids are linear polymers of four basic subunits called nucleotides. A nucleotide consists of a five-carbon furanose sugar joined to both a phosphate and a planar nitrogenous base. The phosphate and the nitrogenous base are bonded to the O5' and C1' ends of the sugar, respectively (see Figure 4). The sugar atom names end with a prime to distinguish them from base atom names.

There are two types of nucleic acids: ribonucleic acid (RNA) using the sugar ribose, and deoxyribonucleic acid (DNA) using the related sugar 2-deoxyribose. A nucleic acid polymer or strand is formed by joining the free 3'OH of nucleotide n with the phosphate group of nucleotide n+1.

Bases

Five different bases are found in nucleic acids: the two purines adenine and guanine and the three pyrimidines: cytosine, uracil, and thymine (Figure 3). Adenine, cytosine, and guanine are used in both RNA and DNA. Uracil is used only in RNA and is replaced by thymine in DNA. Adenine and guanine are attached to the sugar at the N9 position; cytosine, uracil, and thymine at N1. This arrangement permits some very specific and favorable hydrogen bonding between certain base pairs: adenine with uracil (or thymine) and cytosine with guanine.

Figure 3 . Chemical structure of bases

The structure of nucleic acids

Nucleic acids adopt helical structures containing one to four strands (chains). DNA is normally double-stranded or duplex with the chains oriented anti-parallel to each other. These strands are joined by Watson-Crick or ordinary base pairing. RNA is generally single stranded. However, most RNAs are folded into complex shapes containing numerous intra-strand Watson-Crick helices (Saenger 1984). DNA or RNA duplexes, wherein one strand is all purines and the other strand is all pyrimidines, can accept a third strand of pyrimidines oriented parallel to the purine strand. This strand uses Hoogsten base pairing (Watson 1976). Short synthetic sequences containing only guanines can form a four-stranded parallel helix.

Figure 4 . Fragment of ribonucleic acid (RNA)

Fragment of ribonucleic acid (RNA) with sequence guanosine (G), cytidine (C), uridine (U), and adenosine (A) linked by 3', 5'-phosphodiester bonds. Chain direction is shown by arrow.  

Nucleic acids are highly symmetric molecules. A base and its Watson-Crick complement exhibit pseudo-dyad symmetry. In addition, the base pair at n is closely related to the pair at n + 1 by a coordinate transformation. Thus, to a first approximation, the backbones of both strands of a Watson-Crick duplex are identical. The third or Hoogsteen strand resembles the Watson-Crick pyrimidine strand, but differs enough so that it cannot be generated by a coordinate transformation.

The two strands of a Watson-Crick duplex resemble each other, but they usually have different sequences. One strand is called the sense strand, and the other is called the antisense strand. Strand assignment is usually arbitrary with the sense strand containing the sequence of interest. Addition of a third or Hoogsteen strand fixes the assignment of the other two. The sense strand is the one with both parallel and anti-parallel pairs.

A Watson-Crick base defines two coordinate frames. The local frame is specified by the line joining the two C1' atoms which is the y axis and a line normal to the plane containing both nitrogenous rings which is the z axis. The x axis is their cross product. The second or global frame is determined by a base pair and the helix this base pair generates. The z axis of the global frame is along the global helical axis of the duplex. The two frames are related by translation and rotation about the x axes; the local frame is displayed along the positive global x axis and then undergoes a small (1-20) rotation. This displacement from the global helical axis results in two distinct grooves between the strands: a large major groove and a smaller minor groove.

There are two major helical geometries, the right-handed A- and B-forms, and one minor geometry, the left-handed Z-form (see Figure 5). Observed Z-DNA consists only of alternating guanine and cytosine bases, which may serve as a target for DNA binding proteins. RNA and RNA-DNA heteroduplexes can form only A-helices while DNA can adopt A-, B-, or Z-form depending on cellular conditions and sequence. The helical properties of various forms are listed in Table 1.

Figure 5 . Structural forms of DNA

Structural forms of DNA include B, A, and Z. B DNA (left) is a right-handed helix. A DNA (middle) is also right-handed, but the bases are shifted away from the axis of the helix and are inclined with respect to that axis. Z DNA (right) is a left- handed helix with a zigzag backbone that gives the structure its name.  

Table 1. Helical propertiesa

  B-form A-form Z-form
Description   Tall, thin   short, broad   Very thin  
Helix   Right-handed   Right-handed   Left-handed  
Pitch (Å)   3.38   2.5 - 3.0   3.5 - 3.9  
Twist   36   30 - 33   Dimer  
Tilt   -1   19   -9  
Helix Axis   Base pair   Major groove   Minor groove  
Major Groove   Wide   Narrow, deep   Very flat  
Minor Groove   Narrow   Wide, shallow   Very narrow, deep  
a Watson, 1976  


Methodology and implementation.

Here we describe some of the tools available for building and manipulating nucleic acid models using the Biopolymer module.

Building and extending nucleotide strands

The Append command (in the Nucleotide pulldown) is used to create new nucleic acids or extend existing ones. Use of the Append command requires three things: the helical geometry, the append point, and a base pair (1, 2, or 3 nucleotides depending on the helix chosen). While Watson-Crick nucleic acids have no inherent sense and antisense strands, the Append command requires one strand to be designated as the sense strand so that the location of the next base can be determined. The nucleotides are appended by joining the free 3'OH of nucleotide n with the phosphate group of nucleotide n + 1 (Figure 6).

Figure 6 . Structure of a DNA fragment

Insight builds nucleic acids by joining the free 3' OH group of nucleotide n with the phosphate group of nucleotide n + 1. Residue numbering direction is also shown.  

The Append command can create molecules with different geometries and numbers of strands (e.g., B-form DNA duplex connected to A-form DNA triplex and then back to B-form DNA duplex). The Append command always connects to the sense strand of the base pair it added. It connects the other strands only if the distance between the O3' and the P is less than 3 Å. Longer bonds must be made manually. The single stranded geometries provided by the Append command are merely the sense strand of the corresponding duplex. They are intended to build short single stranded regions of otherwise duplex molecules. This can be used to create gaps or overhangs. The geometries of all the nucleotide fragments are extracted from the work of Arnott (Arnott and Hukins 1972, Arnott et al. 1972, Arnott and Selsing 1974).

Whenever Insight II builds a duplex nucleic acid molecule (for example, called MYDNA), the structure is created in the "standard" form for nucleic acid, which has the following features:

Nucleic Acid Standard Form

1.   The DNA is composed of two distinct molecules, one for each strand

2.   The first strand is called MYDNA_1.

3.   The second strand is called MYDNA_2.

4.   The two strands are in an assembly called MYDNA.

Many of the nucleic acid functions in Insight II require the structure to be in this standard form (for example the Measure command). If you import a nucleic acid structure from another program and it is not in the standard form, you need to put it into the standard form. This may require that you Unmerge the two strands, Rename them to conform to the standard form, and Associate the two molecules into an assembly.

Deleting or replacing nucleotides.

Use the Nucleotide/Delete command to remove a single nucleotide from one strand of a nucleic acid. The remaining residues (if any) are not renumbered. If the deleted base had both a 5' and 3' neighbor, the result is a gap, since the surrounding bases are not bonded.

The Nucleotide/Replace command is used to change the nitrogenous base of a nucleotide. The new base is superimposed on the old base according to the criteria specified in Table 2.

Table 2. Superimposition basis

New base Old base Pairsa
pyrimidine   purine   c6, c8   n1,n9   c2,c4  
purine   purine   c8,c8   n9,n9   c4,c4  
purine   pyrimidine   c8,c6   n9,n1   c4,c2  
pyrimidine   pyrimidine   c6,c6   n1,n1   c2,c2  
a First atom of the new residue in the pairs moves onto the old residue.

Measuring nucleotide geometric properties.

The Nucleotide/Measure command (in the Nucleotide pulldown) is used to compute several angles and distances involving pairs of bases and the global helical axis. It is based on Dickerson's NEWHEL91 program suite.

Nucleotide/Measure applies only to the Watson-Crick base pairs of a duplex or triplex nucleic acid molecule. Single stranded molecules and the Hoogsteen strand of a triplex can not be measured. Two pairs of bases must be specified, defining a region, a continuous stretch of base pairs between them.

Because capping changes the nucleotides at the 5' and 3' ends of the strand, capped residues cannot be chosen as input for the Measure_5_Prime and Measure_3_Prime parameters.

The Measure command further requires that the two strands in the nucleic acid duplex are in the standard form for nucleic acids. A complete description of this standard form is given in the discussion of the Append command above. Specifically, the two strands need to be in two separate molecules, and the molecules need to be associated in an assembly.

Modifying existing nucleic acid structures

After nucleic acid molecules are built using the Append command (in the Nucleotide pulldown), the molecules should be properly terminated before further work (e.g., minimization) is done.

The Cap command (in the Nucleic_Acid pulldown) is used to replace the 5' terminal phosphate of a nucleic acid strand with a hydroxyl. It also adds a hydrogen to the free 3' terminal oxygen. The Cap command must be applied to each strand of a nucleic acid molecule.

The Append command requires that the sense strand of a nucleic acid molecule have a free 3' terminal atom. In addition, the 5' end of the antisense strands must have a free phosphate group. The Prime command (in the Nucleic_Acid pulldown) is used to convert the 5' terminal hydroxyl to a phosphate group. It also removes the hydrogen from the 3' terminal oxygen. Each strand must be primed in order to append to it.

Ligation of DNA is a reaction in which a phosphodiester bond is formed between the 5' phosphoryl and the 3' hydroxyl groups of two DNA strands. This reaction is catalyzed by a DNA ligase enzyme. The Ligate command (in the Nucleic_Acid pulldown) is used to ligate the two different strands of the nucleotide.



MSI Product Previous Next Contents Index Top

Last updated March 15, 2000 at 04:03PM Pacific Standard Time.
Copyright © 2000, Molecular Simulations Inc. All rights reserved.