Blast Select an organism from the list to perform a short nucleotide Blast using the NCBI server. A separate window is opened for each organism and/or sequence that is used for a Blast search. A Blast window is only opened when the organism is changed (so click Select, then the organism again to open an identical Blast window twice). Note that on the Design a ZF page and the Predict Target Site page, there are also links to NCBI Blast.
Contiguous or separated target sites These two radio buttons control what type of target sites will be found. The relevant parameters are listed to the right of the radio buttons for each type of target. Contiguous target sites are a stretch of uninterrupted triplets on a single strand. The minimum length of the triplet chain found is specified by the minimum target size parameter. A search for separated target sites seeks two sites exactly of length half-site size that are separated by a user-specified core that is not targeted for binding by ZFPs. Furthermore, the ZFPs can be situated on the DNA such that their N- or C-termini are juxtaposed (note that for the Fok I nuclease domain, the C termini should be closest). For more information on separated target sites such as those used in nuclease design, click here.
Coverage map The coverage map displays all of the target sites found on the input sequence. Since many of the target sites may overlap, the coverage map can only be used to get a feel for which regions of the input sequence have target sites, but does not necessarily display each target site independent of all others. See the list below the coverage map for individual targets.
Fixed sequence The default sequences provided are what we use. The fixed sequence is used to link the zinc finger protein to the remainder of the protein. We mutate the codons in this sequence to have a 5' XhoI site and a 3' SpeI site for cloning. The fixed sequences also include parts of the linker found between ZF domains (EKP on the N-term linker and TG on the C-term linker). See the effector domains section for more detailed information on cloning. Also see backbone sequence descriptions and a diagram of how all the sequences fit together.
IUPAC bases
A, C, G and T
N=A, C, G or T
Y=C or T
R=A or G
S=G or C
W=A or T
K=T or G
M=A or C
B=C, G or T
D=A, G or T
H=A, C or T
V=A, C or G
Position This is the starting position of the first base in the target site relative to the input sequence. On the search page for contiguous target sites, the position can be sorted by increasing or decreasing position, as well as by increasing or decreasing scores.
Parse target site When searching for contiguous target sites, only the minimum target site size can be specified. Due to the large number of triplets available, sometimes a very long target site is found. Click on the "parse" arrow to break down long target sites into shorter ones by "sliding" a window in 3-bp increments along the entire target site. NOTE: information contained in the parse page is more easily accessed by using the new sub-parse feature to access subsites. However, the parse page allows you to alter the size of the target site.
Minimum target size This parameter indicates in bases the minimum length of contiguous target sites to be identified. The "Contiguous targets" radio button must be checked for this parameter to affect the search. Sites larger than this minimum may also be found. In cases where the identified sites are larger than the desired length, a subset may be chosen; however, this subset must be "in-frame" in that only multiples of triplets can be removed from either end. When a subset of the site is desired, the corresponding protein sequence can be obtained by using the "Design a Zinc Finger Protein" tool. Note that the minimum length, rather than the absolute length, is specified to prevent the output list of target sites from rapidly becoming very long and confusing as to which targets are adjacent.
Schematic of ZFP-DNA interactions
Schematic of ZF protein-DNA interaction. Note that finger 1 (N-term) binds to the 3' region of the target site.
Score and Top Score Due to the large number of target sites that ZF Tools may identify, scoring functions were implemented to assist users in selecting promising target sites. As a rule in ZF Tools, the higher the score, the better. However, different scoring functions may generate conflicting results, so wise choice of an appropriate scoring function is essential (see below). Currently, several scoring functions have been implemented that attempt to measure the predicted specificity or other attributes. The "Base" function is the most sophisticated and is the default function. The scores determined on this website should be viewed with skepticism since we do not yet have enough empirical data to make a robust scoring function, nor have the scoring functions been validated experimentally (that is work in progress). Also, multi-target specificity (ELISA) assays were performed for each finger in the context of a 3-finger protein that targets, for instance, GCGNNNGCG, where NNN was the evaluated finger. This type of assay may not be completely generalizable when the finger is placed into a new context. Also note that scores do not account for DNA accessibility or affinity. In vivo, factors such heterochromatin or other DNA binding proteins may signficantly affect the binding of a ZFP.
The score reported on the contiguous search page ("Top Score") reflects the best score within a set of subsites. Click on the "+" symbol to expand the list of subsites. Upon expanding the list of subsites, each target site is shown with its individual score, and the subsite responsible for the top score can be determined. The list of subsites can be sorted by either score or position.
SCORES GENERATED USING DIFFERENT SCORING FUNCTIONS CANNOT BE COMPARED since the scoring functions are all unique. In other words, a score of 10 found using the "XNN" method is not necessarily better than a score of 1 from the "Base" method. Sometimes even scores generated by the same function should not be compared: Scores generated by the "GNN" and "GACT" methods are direct functions of the target site length, and thus target site lengths must be identical for meaningful score comparisons.
The scoring functions use the following formulas:
Base: The Base function is the default function and uses data from our ELISA specificity graphs in order to predict the total number of off-target sites that a given ZFP may recognize. Clearly, the fewer the off-target sites, the better. Click for a sample ELISA graph, which may aid your understanding of the below explanation. The function works by accessing a pre-calculated database of scores for each ZF domain that represents the predicted number of triplets that a given domain can recognize. The predicted promiscuity is determined by analyzing each ELISA graph to determine how many bases are tolerated in each of the three triplet positions. Each position is analyzed, holding the other two constant and equal to the values the domain is supposed to recognize. Specifically, if a triplet has an ELISA value of at least 0.15, then the base is assumed to be tolerated in that position, and a value of "1" is assigned. The number of bases tolerated in each of the three positions is multiplied to obtain a score ranging from 1 (best) to 64 (complete promiscuity). In practice, triplets in the database at the time of this writing have scores ranging from 1 (i.e. TAA and GAT) to 36 (GGT). In theory, the number of target sites that a ZFP of N fingers can recognize is the product of the individual score for each composite finger, and ranges from 1 to Max_Score*N (i.e.36*N). However, it would be desirable to have a final score that is not dependent upon the number of fingers comprising the ZFP, and that also reflects the variance in the finger scores. Therefore, the total score is determined by finding the geometric (not arithmetic) mean of all the fingers. This number essentially represents the average score that each finger contributes to the final product. The geometric mean must be used since we are concerned with a product, not a sum of numbers. To account for the variance, the geometric mean is multiplied by one geometric (not arithmetic) standard deviation, which gives a value that 68% of scores are less than (the standard deviation would be added in the case of an arithmetic mean). The score is next inverted so that a higher score is better (to maintain logical consistency with the "GNN" and "GACT" functions). To invert the score, it is subtracted from 64, which is simply an arbitrary reference point large enough to avoid negative scores.
XNN: The "XNN" function is less sophisticated than the "Base" function. It is also based upon ELISA data, but uses less data to determine its score, and in the case of the TNN domains, our data is lacking. This function is meant to characterize the average promiscuity of the composite ZFP. Click for a sample ELISA graph, which may aid your understanding of the below explanation. The white bars in an ELISA graph (other than the TNN ELISAs) represent the ability of a domain to recognize all 16 triplets generated by systematic substitution of the 2nd and 3rd bases (while holding the 1st base constant). In the case of the TNN ELISAs, the 2nd and 3rd bases are also fixed, so only the variability of the 1st base is actually measured. To determine a promiscuity for the finger, we would like to know how many of the 64 possible triplets it can bind to (1 is best, and 64 worst). The white bars for the GNN, ANN, and CNN ELISAs give us a feel for this number. However, since the white bar that has the same 5' base as the triplet is designed to bind will have a value dominated by this triplet, we must first mathematically correct the height of this particular white bar to compute the contribution of only the other 15 triplets, thus removing the value for the intended target (i.e. remove the contribution of GAA from the GNN white bar). For GAA, this correction would be accomplished by adding up the ELISA values for the other 15 GNN triplets and dividing by the value of all 16 GNNs. The white GNN bar is then scaled by this value. We now have a value ranging from 0 to 1 for each white bar that reflects the finger's ability to bind to all triplets except that which it was designed to recognize, or in other words, its promiscuity (the TNN's are still an exception, as noted above, but are treated the same as the other triplets). A score for each finger is determined by summing the values of the four white bars. For a ZFP with N domains, we then take the arithmetic mean of all N fingers and add two standard deviations to obtain a score that encompasses ~95% of all scores. While the geometric mean would also make intuitive sense here, the geometric mean cannot be computed when any value is 0, and XNN scores range from 0 to 4. The score is next inverted so that a higher score is better (to maintain logical consistency with the "GNN" and "GACT" functions). To invert the score, it is subtracted from 12, which is the theoretical maximum (largest mean + 2*largest standard deviation), to avoid negative scores.
GNN: The "GNN" function simply counts the number of GNN domains present. The higher the score, the more the number of GNN domains. Since GNN domains were the first to be developed and characterized (citations), some investigators have a preference for using them. The score reported is actually a multiple (4-fold) of the number of GNN domains that is the same coefficent used in the "GATC" scoring function in order to make comparisons between the functions easier.
GACT: The "GACT" function counts the number of GNN, ANN, CNN, and TNN domains, and multiplies the number of each type of domain by a coefficient. This function is slightly more sophisticated than the "GNN" function in that it recognizes the desire of some investigators to use GNN and ANN domains, followed by CNN and TNN domains, simply as a matter of preference. The coefficients emphasize the following order: GNN>ANN>CNN=TNN. Specifically, the coefficients are: GNN=4, ANN=3, CNN=2, and TNN=2. The coefficients were chosen arbitrarily.
Separated target site parameters Three parameters control the search for separated target sites. The "Separated targets" radio button must be checked for these parameters to affect the search. The first parameter is the half-site size. This parameter defines the exact size (in base pairs) of each of the two ZFP target sites. Note that this definition differes from that of the minimum target size parameter for contiguous targets, which defines the minimum target size. The core parameter permits specification of the desired core sequence that separates the two target sites. For a diagram of the core location, click here. Note that you can enter either a number or a sequence using IUPAC bases. If you enter a number, it will be interpreted as the number of bases of type "N" to search for (where N is equal to any base). For instance, entering "5" is equivalent to typing "NNNNN". Whereas entering "ACGTA" specifies the core sequence exactly. Entering "YYYNN" would specify a core of 3 bases consisting of either C or T followed by 2 bases of any type. You may also select from a list of common 6-base restriction sites using the select box to the right of the core sequence box. Restriction sites often aid in the analysis of ZFN efficiency (for example, see citations from the Carroll group). The juxtapose option defines whether the N- or C-termini of the ZFPs should be oriented closest. Effectively, the parameter toggles which strand (top or bottom) that half-site #1 and half-site #2 occupy. For nuclease design using the Fok I catalytic domain, the C-termini are typically aligned.
Single letter amino acid codes
G A V L I S T C M P H R N Q E D F W Y K
Subsites
A search for contiguous target sites specifies the minimum target size of the target site to be
found. Typically, the discovered target site
is longer than the desired target site size. The minimum length, rather than the absolute length, is specified to prevent the output
list of target sites from rapidly becoming very long and confusing as to which targets are adjacent.
A subsite is defined as a sequence where the length is equal to the minimum specified size.
For instance, if the minimum target size requested is 6 bp, and a target site is found that is 9 bp, there will be two
subsites of 6 bp (one from positions 1-6 and another from positions 4-9). Subsites can be accessed by clicking on the
button
next to the position tag. The information that is then brought up is the same as that which can be found on the parse page.
Target site The target site is a DNA sequence that must be comprised of valid triplets. The triplets are recognized by ZF protein helices. A target site of 18 bases is considered unique within the human genome.
Target site overlap (TSO) TSO interactions can potentially decrease the independent modularity of the triplets. The zinc finger domain is generally considered to be modular in nature, with each finger recognizing a 3 bp subsite. This idea is supported by our ability to recombine zinc finger domains in any desired sequence, yielding polydactyl proteins recognizing extended sequences. However, it should be noted that, at least in some cases, zinc finger domains appear to specify overlapping 4 bp sites rather than individual 3 bp sites. This event typically occurs with GNG triplets. Therefore, GNG triplets are best followed by either a GNN or a TNN triplet; if this requirement is not met, there are potential target site overlap issues. Offending triplets are highlighted in red. ZF Tools only evaluates the triplets within the target site. So even though ZF Tools might report that there are no TSO incompatibilities for a particular target site, you should remember to evaluate the site in the context of surrounding DNA sequence. In other words, if finger #1 is a GNG, and a C or A base follows the site in the genome, then TSO incompatibilities might occur. For more information, see Beerli et al. Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14628-33. Note that prior to Februrary 7, 2006, ZF Tools flagged sites with a TNN as having potential TSO issues, but this restriction was subsequently loosened.
Triplets to search These checkboxes permit selection of the set of triplets allowed in the search. If, for instance, target sites consisting only of GNG triplets are desired, then the GNG checkbox alone should be selected. Note that the triplets for which helices are available may limit the theoretically available triplets to those we have developed helices for. For instance, there are 15 ANN triplets available, not 16.
ZFP Click to obtain the amino acid sequence encoding the zinc finger protein predicted to recognize the DNA sequence listed.
ZF Set The total set consists of all triplets and is the most comprehensive available. It is recommended for general usage. The 23-21-19 library is a subset of 16 GNNs, 5 ANNs and 3 TNNs published by Blancafort et al. (Nature Biotech., V. 21, 2003). This set is included to help determine the maximum number of targets a library selection with this library may find. This is a maximum number because all 25 triplets are assumed to be present for each finger, but F1 has 23 triplets, F2 has 21, and F3 has 19. F1-F3 are replicated in the 6 finger library. Click here to see how the library was built. Note that this library uses some suboptimal helices and that the TAA domain has poor specificity. When ZF Tools is asked to design the ZFP for a DNA target found from a search using the 23-21-19 library, only optimal amino acid helix domains are used, and so the amino acid sequence may not correspond to those selected from the library.
Disclaimer:
While every effort has been made to ensure the accuracy of the information reported on this site, we make no
guarantees
and strongly suggest that you double check all sequences and information obtained here before using them.