MSI Product Previous Next Contents Index Top
Xsight



4       Methodology


Getting started

It is assumed that you have properly installed the Xsight package and that you have defined the $CRYSTALDATA environment variable.

When you first begin work with Xsight you need to go through the following steps:

1.   Start the Insight  II software by typing:


>	insightII

at the system prompt and pressing <Enter>.

2.   Now go to the Xsight module by either typing:


>	Command: Xsight

and pressing <Enter> or by clicking the MSI logo and selecting the Xsight module. The Utilities pulldown is the only active pulldown when you first go to the Xsight module. The rest of the module is only available once you define a project.

3.   To create a crystal file select the Utilities/Crystal command. The directory defined by the environment variable $CRYSTALDATA is displayed in the Crystal_Directory parameter box. This does not need to be changed.

If you then enter the name of a crystal file in the Crystal parameter box and press <Enter>, the parameter block expands and allows you to enter the title, cell parameters, and space group. To enter the proper space group, click the Space_Group and then click the appropriate Crystal_Class value. To load the desired space group from the value-aid into the Space_Group parameter, click it with the mouse. To create the file, select Execute.

4.   To create a project select the Utilities/Project command. If you enter a name for the project in the Project_Name parameter field and press <Enter>, the parameter block expands, giving you access to the Crystal parameter field and the Project_ Directory. The crystal file that you created in Step 2 should be listed in the Crystal_List value-aid. To load the file into the Crystal parameter field, simply click it with the mouse. To create a new project, enter the directory path name in the Project_ Directory parameter field that contains the data that you wish to work with, and then select Execute.

5.   Next time... The next time you use Xsight you should go through the following steps as appropriate:


Importing structure factor data

File formats used by the Xsight module

All of the crystallographic programs contained in the Xsight package read the same types of reflection data file. The Xsight system (including the XtalView component) uses two main file formats for storing reflection data--Fin and Phase files. Fin and Phase files may be converted to CNX reflection format.

The Fin file format

This file format is used for crystallographic applications which use the observed structure factor amplitudes but which do not require phase angles. These applications include most molecular replacement calculations and refinements of atomic co-ordinates.

The Fin file is an ASCII file that contains data items

h k l F1 SIG1 F2 SIG2

where h, k, l are the reflection indices, F1 and F2 are the observed structure factor amplitudes and SIG1 and SIG2 are the standard deviations associated with the structure factor amplitudes.

F1 and F2 are usually used to record Bijvoet reflection pairs as separate data items. If the Bijvoet pairs have been merged together (as is often the case) then the reflection is usually recorded as the F1 and SIG1 data items with dummy values of 0.0 and 9999.00 for the F2 and SIG2 data items.

The FORTRAN style file format (3i4,4f8.2) is used to write Fin format reflection files although almost all programs in the Xsight module read the data in Fin format reflection files as free format items.

The file extension .fin is used to indicate a Fin formatted file.

The Phase file format

This file format is used for crystallographic calculations which require phased structure factor data. These applications include density modification calculations and model-fitting. For model-fitting applications that employ maps obtained with phases calculated from an atomic model it is sufficient to supply a `fake' phase file, with dummy values for the actual phase values, since the phases may be calculated within the Xfit program (see tutorial lesson 6 for an illustrative example). Phased structure factor data in the Phase file format is the outcome of MIR and MAD phasing calculations. Structure refinement with the ProLSQ95 program also provides a phased structure factor file that contains amplitudes and phases calculated from the atomic model.

The Phase file format is an ASCII file that usually contains the data items

h k l Fobs FOM Phase

where h, k, l are the reflection indices, Fobs is the observed structure factor amplitude (i.e., usually the merged values of F1 and F2), FOM is a figure of merit that indicates the reliability of the phase value and Phase is the phase of the reflection.

Sometimes the Phase file will contain data items

h k l Fobs Fcalc Phase

where Fcalc and Phase are calculated structure factor amplitudes and phases from an atomic model.

All of the programs in the Xsight package read the data in the Phase format reflection files as free format items.

The file extension .phs is used to indicate a Phase formatted file.

Data files for the CNX program

In applications that employ the CNX program you may supply a Fin or Phase file and an automatic conversion to the CNX reflection format will be carried out. Alternatively, you may use the Data_Control/File_Import pulldown to convert your reflection data to the CNX format (see below). With an appropriate file format specification it will usually also be possible to back-convert reflection data files from the CNX file format to the Fin and Phase formats.

Converting a reflection datafile to the Xsight file formats

When you obtain a new set of reflection data your first task is usually to convert this data file to the Fin or Phase formats for use in the Xsight system. An exception to this requirement is the MADSYS phasing program (interfaced in the MAD pulldown), which uses the original files of processed data.

To carry out file format conversions and perform other data manipulations, go to the Data_Control/Import_Data menu item.

The Input_File_Format options indicate the possible arrangements of reflection data that may be read as free-format items. You may want to note that the hkl_F1_SIG1_DELANOM option could be used to read the R-AXIS II file format, the hkl_F1_SIG1_F2_SIG2 option is the Fin file format, the hkl_F1_FOM_PHS option is the Phase file format, the hkl_I_SIG option could be used to read files used by the SHELXL refinement program, and the hkl_I1_SIG1_I2_SIG2 option could be used to read data from a format commonly used by the DENZO data processing package. To import data from the binary files used by the CCP4 program package you should first use the CCP4 utility program to dump the binary file to an ASCII file. This ASCII file may then be read using the Data_Control/Import_Data menu item.

The Use_Format_Spec allows you to specify a FORTRAN style format specification for the input reflection file. This option allows you to read additional types of datafiles by using FORTRAN space descriptor (x's) to bypass characters or unwanted data items in the file.

Other options in the Import_Data control panel are available for common tasks such as filtering the input data, sorting the reflection data, moving reflections to a unique reciprocal-space asymmetric unit, and re-indexing the data to eliminate `hand' differences between files containing equivalent data.

When you select Execute in the Import_Data control panel, some diagnostic information appears on the command line. Additional information and a record of the input options appear in the data_import_xsight.log output file.


Phase determination by multiple wavelength anomalous diffraction (MAD)

The MADSYS program (Hendrickson, 1991) for phase determination from multiple wavelength anomalous scattering data is available as a separately licensed component of Xsight. This program provides a complete and exact analytical solution to the MAD phasing problem. The user interface to MADSYS provides facilities for determining the expected amount of scattering for the sets of MAD data, combining and scaling the MAD data sets together, determining the anomalous scattering signal from the data, refining parameters for the anomalous scattering atoms and phase determination.

Assuming the number and approximate locations of the anomalous scatterers has been determined the sequence of commands that would be used for phase determination by the MAD method would be:

(1) Calc_Absolute_Scatt - calculate the expected protein and anomalous scattering from input sequence information and the number of anomalous scatterers.

(2) Combine_MAD_Data - combine and scale the MAD data sets together to produce a single output file containing all experimental information.

(3) Calc_Anom_Diff - using the output file from step (2) solve the MAD equations to determine the anomalous scattering and then merge the data to produce an output file of merged data and anomalous differences.

(4) Refine_Anom_Sites - refine the anomalous scatterer parameters using the output file from step (3).

(5) Determine_Phases - carry out phase determination using the output file from step (3) and the refined anomalous scatterer parameters to give a set of phased structure factors for map calculations and model-building.

Note that steps (2), (3) and (5) are the main MAD phasing steps. Step (1) provides some information which may be useful for selecting input parameters for phasing process and step (4) is taken to improve the anomalous scatter sites prior to use in the final phasing step.

Determining the expected scattering

When the protein sequence and the number of anomalous scatterers in the crystal asymmetric unit is known, it is possible to estimate the expected amount of scattering and changes in scattering at the wavelengths used in the data collection. The number of anomalous scatterers will be known if you have already determined the anomalous scatterer sites from Patterson maps or if you are using a Se-Met containing protein for MAD phasing. The calculated scattering information may be employed when scaling data sets together for the extraction of experimental anomalous differences.

The Calc_Absolute_Scatt command in the MAD pulldown is used to carry out this calculation. When you click on the Calc_Absolute_Scatt command you will enter the Job_Control operation in which you will enter the lower and upper resolution over which you wish to calculate the anomalous scattering and the number of wavelengths for which you have experimental data.

Next, from the Overall_Params operation you will enter an input sequence file containing the protein amino acid sequence, the number of protein copies per crystal asymmetric unit, the number of anomalous scatterers per asymmetric unit, estimates for the average temperature factors for the protein and anomalous scatterers and the anomalous scattering element.

In the Wavelength_Params operation you should enter the values of f´ and f´´ for each of the wavelengths that was used in the data collection. The values that you enter will be reported in a table. Once these values have been entered they will be saved for use in other parts of the phasing process.

Finally, the Run operation is used to execute the actual calculation of the scattering from the protein and anomalous scatterers at different wavelengths. This calculation takes only a very few seconds to complete. The absolute root-mean-square value of the scattering and the predicted wavelength-dependent changes in scattering will be output from this run. These results will be saved for use in other parts of the phasing process. After the program is completed you may use the Quit_Command operation to leave the Calc_Absolute_Scatt command.

Combining and scaling of the MAD data sets

At entry into the MADSYS system the MAD data are provided in separate files, with each file corresponding to a set of processed data collected at a particular wavelength. Phase determination with the MADSYS program requires at least three sets of measurements at different wavelengths and is more successful if four or more data sets are available. For the MAD phasing to proceed these files need to be combined and an experimental geometry specified for each batch of data within these files. Since MAD phase determination is based on careful extraction of what are often relatively small differences between data sets, careful scaling of data sets is required to remove systematic sources of error. The MADSYS program employs a parameterized local scaling method to reduce noise in Bijvoet reflection differences and applies further scaling to remove spurious wavelength-dependent differences between data sets.

The steps of combining and scaling the MAD data sets are carried out using the Combine_MAD_Data command. The result from the Combine_MAD_Data command is a single file that contains scaled sets of all of the input data. The convention in Xsight is that the file extension .cmb is used to denote this file.

The Combine_MAD_Data command is divided into a series of sequential operations that step through the various processes of data combination and scaling. In the Job_Control operation you must provide the resolution limits for your input data and the number of wavelengths (equals the number of input data files) at which you measured diffraction data. The Num_Data_Groups parameter is used to provide a means for assigning different experimental information to batches of data within a file. For example, the first 60 frames of data in each file may have collected with the crystal in a particular setting and then the crystal orientation was altered for the final 30 frames of data collection. In this case there are two batch groups. The information needed to define the experimental parameters for these groups is specified in the Define_Data_Groups operation, described below.

The Import_Data_Files operation is used to enter the format and names of the files containing the processed MAD data. The input data files must contain the hkl indices of each reflection, the batch number for the reflection followed by the intensity and standard deviation of the reflection. Data may be read in directly in the SCALEPACK format (i.e., data processed using the `NO MERGE ORIGINAL INDEX' scaling output option of the SCALEPACK program) as an ASCII output format from the CCP4/AGROVATA program or in an arbitrary format given by a user-provided FORTRAN style format specification. Use the File_Format parameter to specify which of these three options to apply. The file names may be entered in the WL_File_Name parameter block using the WL_File_Names value-aid. If you have already used the Calc_Absolute_Scatt command the order in which you enter the data files should be the same as the order in which you entered the f´ and f´´ values. As you enter the data files the wavelength numbers and file names will be reported in a table.

Next, go to the Define_Data_Groups operation. For each batch group set the batch numbers corresponding to that particular group and the experimental geometry. The Pairing_Tolerance parameter is used to define the maximum number of data frames by which equivalent reflections in the different data sets may be expected to differ. The Geometry_Type option is used to specify the experimental geometry used for the MAD data collection.

Once all the necessary parameters that define the data collection are set you will go to the Scale_Bijvoets operation for scaling Bijvoet differences within data sets and Scale_Wavelength operation for scaling data between the different files. Both of these operations provide mechanisms for defining resolution shells of data for scaling. The Num_Bv_Selections and Num_WL_Selections parameters are used to specify the number of resolution shells that will be used. For both of these operations you also have the option of selecting which of the batch groups to include in the scaling calculations. In most cases you will include all of the groups but you may wish to exclude a particular batch group if you suspect some problem with the data in a particular range (for example crystal slippage or the on-set of radiation damage). In many cases you will wish to use the same selections for scaling both the Bijvoet differences and the separate sets of data. For this reason a Use_Bv_Selection option has been supplied in the Scale_Wavelength operation for automatically duplicating this information. Information on these scaling selections will be reported in tables.

The Absolute_rmsF parameter in the Scale_Wavelength operation will be automatically provided if you already ran the Calc_Absolute_Scatt command. The parameter is used to set the experimental data on an approximately absolute scale. If you did not run the Calc_Absolute_Scatt command you may provide an arbitrary value for the Absolute_rmsF parameter (for example, 500) but when you go to the Calc_Anom_Diff command you should be careful not to reject data using criteria that depend on the absolute values of the reflections.

After using the Run operation to run the Combine_MAD_Data command you may use the Analyze command to check key results from the resulting log file. To leave the Combine_MAD_Data command you should use the Quit_Command operation.

Determination of anomalous scattering from the experimental data

The purpose of the Calc_Anom_Diff command is take the combined and scaled data file produced by Combine_MAD_Data command and create a file in which anomalous differences are extracted, merged and weighted for use in phase determination by the Determine_Phases command. The output file will also be used for the refinement of parameters for the anomalous scatterers. The Xsight convention is that the file extension .mrg is used to denote this output data file. The Calc_Anom_Diff command may also be used to provide a set of anomalous differences corresponding to the anomalous scatterer `FA' amplitudes in the .fin file format. These amplitudes may be used as coefficients in Patterson map calculations in order to check or determine the positions of the anomalous scatterers. These amplitudes may also be used as input for the automated heavy atom search program provided through the MIR/Find_Heavy_Atoms command.

When you enter the Calc_Anom_Diff command you will be in the Job_Control operation. In this operation you may set upper and lower resolution limits for the calculations. The Select_Data operation allows you to specify the input data file (i.e., the output file from the Combine_Anom_Diff command) and the output file containing the unique set of merged data with anomalous differences extracted for phase determination. If the Write_FIN_File option is toggled on then you may also write out a file in the .fin format containing the `FA' amplitudes. These anomalous scattering differences may be used as Patterson coefficients to check or locate the anomalous scattering sites by using the MIR/Calc_Fourier menu item.

The Overall_Params operation contains various cutoffs that may be used to limit the data used in these calculations. The default values will not usually need changing. In the Wavelength_Params operation you may elect to refine the relative scale factor between the scattering at different wavelengths, the values of f´ and the values of f´´. The Select_Groups operation gives you the option of eliminating certain batch groups of data from these calculations. The Merge_FA operation applies some additional cutoffs for computing merged and weighted anomalous differences. The default parameters will not normally need to be changed.

After using the Run operation to run the Calc_Anom_Diff command you may use the Analyze command to check key results from the resulting log file. To leave the Calc_Anom_Diff command you should use the Quit_Command operation.

Phase determination

The Determine_Phases command is used to complete the phasing process by taking the output data file produced by the Calc_Anom_Diff command and a file containing the sites for the anomalous scatterers and producing a set of phased reflection data in the Xsight .phs file format.

When you enter the Determine_Phases command you will set the resolution limits for phasing calculations in the Job_Control operation. Next, you will go to the Calculate_Phases operation to specify the input and output files and various cutoff parameters for eliminating data from the phasing calculations. Usually it will not be necessary to change these parameters. If you have refined the atomic parameters for the anomalous scatterers, the Fcalc_Scale parameter will be entered automatically. Fcalc_Scale controls the scaling between the model and observed anomalous scattering.

After using the Run operation to run the Determine_Phases command you may use the Analyze command to check key results from the resulting log file. To leave the Determine_Phases command you should use the Quit_Command operation.

The output file of phased data from this operation will have been automatically re-formatted in the `Phase' format ready for use in the Model_Building pulldown. Before attempting model-building it is usually useful to use the MIR/Calc_Fourier and MIR/Contour_Map commands to gain an overview of the quality of the electron density map. A good map should show solvent volumes with few density features and continuous density features in the protein volume. The anomalous scatterers should show up as strong peaks in this map. You may also wish to use the density modification facilities in the Density_Modification pulldown to improve the phase set before beginning model-building.

Refinement of parameters for the anomalous scatterers

The determination of phases by the MAD method requires an atomic model of the anomalous scatterers in the crystal. The MADSYS program includes the ability to refine this atomic model against the data in the output file from the Calc_Anom_Diff command.

The file containing the anomalous scatterers is in the same format as the `solutions' file used by the MIR commands contained in the MIR pulldown. The MADSYS program actually uses a slightly different format but the Xsight system automatically carries out a format conversion to ensure compatibility across applications. This file contains three lines of header information followed by lines for each atom, containing an atom identification code, fractional atomic co-ordinates, element code, occupancy, and temperature factor. For example:


DERIVATIVE
CRYSTAL madex
FILE MADSYS
ATOM H1 -0.25717 0.96833 0.23004 SE 1.000 24.560
ATOM H2 -0.73186 0.89547 0.22508 SE 1.000 14.014
ATOM H3 -0.45629 0.06980 0.15692 SE 1.000 18.529
ATOM H4 -0.46770 0.11249 0.13427 SE 1.000 9.032
ATOM H5 -0.21994 0.41690 0.12930 SE 1.000 20.440
ATOM H6 -0.32728 0.43199 0.07393 SE 1.000 11.608

In the Job_Control operation you specify the resolution limits for the data to be used in the refinement of the atomic model. In the Refine_Files operation you specify the name of the input data file, input atomic model and output atomic model. An automatic system is used to create new generation numbers for the file containing the output atomic model based on the file name for the input atomic model.

In the Refine_Params operation you may specify various cutoffs to be applied to the data in the refinement. The default values for these cutoffs will normally be used. The Anom_Refine_Type specifies which atomic parameters are to be refined. All options carry out positional refinement but the Scale, Occupancy and B_Factor options additionally refine the scale, individual atomic occupancy and individual atomic temperature factor parameters. You should normally refine the scale first (this value will automatically be transferred to the Fcalc_Scale parameter for later refinement runs) and then you may wish to use the B_Factor and, if appropriate, Occupancy refinement types in later runs with this command.

The Run operation is used to execute the command and the Analyze operation is used to report back the R-factor for the heavy atom model as a function of refinement cycle. You should exit from the Refine_Anom_Sites command with the Quit_Command operation.


Phase determination by isomorphous replacement

The Xsight package contains a complete set of routines for solving crystal structures by isomorphous replacement. Options are available for scaling the heavy atom data set to the native data set, automatically locating heavy atom sites, comparing the predicted heavy atom positions against difference Patterson maps, refining the heavy atoms, and for calculating phases. Some of the routines used for these tasks are available through the incorporation of the XtalView package within Xsight.

The routines for phase determination by multiple isomorphous replacement are located in the MIR pulldown. These routines are discussed in detail in the following sections, and Chapter 5, Tutorial includes a tutorial example on their use.

Scaling and merging native and derivative data sets

To merge together two sets of data, go to the MIR/Merge_Data command. In this command you can use the value-aid to enter .fin files that contain your native and derivative structure factors into the Fin_File_1 and Fin_File_2 parameter blocks. Enter a name for the output .fin file, which will contain both sets of data, in the Merge_Output_File parameter block. Select Execute to load these files and spawn the Xmerge menu.

From the Xmerge menu you can choose whether to scale the two data sets in the Single or Anisotropic mode. The difference between the two options is whether a single or anisotropic temperature factor is applied to remove resolution-dependent differences between the two data sets. When you click the Scale button, the scaling and merging begin. When the two data sets are scaled together (usually in a few seconds), graphs comparing the structure factor differences and merging R factors as a function of resolution automatically appear.

Calculation of difference Patterson functions

To calculate a difference Patterson from a .fin file that contains a native and derivative data set, go to the MIR/Calc_Fourier command. Use the value-aid to enter your .fin file in the Phase_File parameter block. Enter a filename for the Patterson map in the Map_File parameter block. Select Execute to load these files and spawn the Xfft menu.

Now set the Map Type to Fo*Fo(Patterson). Since the Xfft menu is also used for conventional Fourier map calculations with .phs files, hold MB3 on the Phase File Type button and go to the Extras for Pattersons only menu. Select the fin(Fp s(Fp) Fph s(Fph) option. This option instructs the program to carry out a difference Patterson calculation from a .fin file containing two sets of data. You can calculate an origin removed Patterson map by selecting the Remove Patterson Origin checkbox.

This menu also contains a Resolution Filter that allows you to change the resolution limits of the data that will be used in the calculation and an Outlier Filter to eliminate reflection pairs with very large differences from the calculation. If you edit the values used in these filters, press <Enter> after making each edit to ensure that your new value is entered. When you click Calculate, the difference Patterson calculation begins. The map is usually calculated within a few seconds.

If your .fin file contains MAD `FA' structure factor coefficients in the F1 column and zero values in the F2 column then this same procedure may also be used to calculate a pseudo-difference Patterson map for the anomalous scatterers.

Examining difference Patterson maps

Two methods are available in Xsight for visualizing difference Patterson maps.

To examine a full 3D representation of the difference Patterson map, select the MIR/Contour_Map_3D command. The Contour_Operation parameter is initially set to Setup. You should set the Map_File_Type parameter to Patterson_Map to display a difference Patterson function. Use the value-aid to enter the name of your Patterson map in the Map_File parameter block.

The Contour_Sigma_Level parameter is used to set the number of standard deviations above the mean density at which the Patterson map will be contoured. A contour level of about three standard deviations is often appropriate for identifying significant features in the Patterson map. If the Two_Level_Contour option is toggled on a second contour level will also be set in the display. This option is useful for highlighting very significant peaks in the map.

For a new and uninterpreted difference Patterson map the Input_Atom_Sites option should be toggled off since the atomic sites for the heavy atoms are still to be determined. If you do have a trial solution for the heavy atom derivative (i.e., a set of atomic coordinates for the heavy atom constellation in the .sol file format) then you may toggle the Input_Atom_Sites option on and use the In_Solution_File parameter to enter the name of the file containing the heavy atom co-ordinates. This trial solution could have been generated using the MIR/Find_Heavy_Atoms command (see the sub-section below) or by using an external Patterson interpretation or direct methods program.

When you select Execute the full three-dimensional representation of the difference Patterson map will be generated and the Contour_Operation parameter will step to the Edit_Sites operation. In the graphical display of the Patterson map the Harker sections (if present) are denoted by a coarse grid of green lines. This demarkation of the Harker sections is intended to aid in the identification of self-vectors, which result in peaks on the Harker sections.

If a file containing putative heavy atom co-ordinates was input in the Solutions_File parameter block in the Setup step then the interatomic vectors between these sites will have been automatically calculated and displayed on the Patterson map in a unique quadrant of the crystal cell. This Patterson volume (one-quarter of the full unit cell) is outlined with dark blue edge lines. The interatomic vectors are also reported in a pickable table, with the self-vectors marked in yellow and the cross-vectors marked in purple in the three-dimensional display.

If you wish to change the contour levels used in the display of the Patterson map click on the Redo_Contour option of the Patterson_Operation parameter, fill in new parameters for the contour levels and select Execute. To leave the MIR/Contour_Map_3D command set the Contour_Operation option to Quit_Command and click on Execute.

An alternative method for displaying difference Patterson maps is to use the Xcontour menu from the XtalView program. This menu gives a section-based representation of the difference Patterson function and allows for making hard-copy plots of the map in PostScript format.

To examine the difference Patterson map using this menu, go to the MIR/Contour_Map command. Enter the name of your map with the Map_File value-aid. When you select Execute, the Xcontour menu is spawned. Commands in the Xcontour menu allow you to change the view plane to either X, Y, or Z. You may use the Prev Section and Next Section buttons to scroll through the map a section at a time.

There are also options for viewing sections as stacks and for changing contour levels. As with the three-dimensional representation, the most useful thing to do is to examine the Harker sections for peaks that might indicate vectors between heavy atoms.

Automatic location of heavy atom sites

To automatically locate the heavy atom sites in your first heavy atom derivative, select the MIR/Find_Heavy_Atoms command. The Set_Operation parameter will initially be set to the Set_Parameters option and this menu is where you will provide all the program input. Use the value-aid to enter the name of your .fin file containing the scaled native and derivative data in the Input_Fin_File parameter block. If you are determining the sites of anomalous scatterers for MAD phase determination with the MADSYS program then the .fin file should contain the `FA' structure factor coefficients. This file will normally have been written using the Write_FIN_File option in the MAD/Calc_Anom_Diff command as part of the MAD phase determination process.

The Input_Data_Type parameter should be set to the Isomorphous_Diff option for heavy atom phasing with heavy atom derivative data. The Bijvoet_Diff option is available for if you wish to try to locate heavy atoms sites using the Bijvoet differences within a heavy atom derivative data set. In this case the F1 and F2 columns in the .fin data file should contain the Bijvoet mates for each structure factor. The MAD_FA_Coeffs option is used for determining the sites for anomalous scatterers using as input a .fin file containing the MAD `FA' structure factor coefficients.

In the Output_Sol parameter block you should enter the file name for the solution file that will contain the heavy atom sites located by the heavy atom search program. The heavy atom search program writes files with three lines of header information followed by lines containing the fractional co-ordinates of each of the located atoms. For example:


DERIVATIVE
NAME AU
FILE /net/iris41/usr/people/jb/Tests/ccp_n1_au1.fin
ATOM H1 0.3673 0.1228 0.3838 AU 1.000 20.0
ATOM H2 0.1224 0.3509 0.8990 AU 1.000 20.0

This format is readable by the heavy atom refinement and phase determining program Xheavy, which is a part of XtalView and which is accessed in Xsight through the MIR/Determine_Phases command. For the refinement of anomalous scatterer sites with the MADSYS program, the MADSYS interface will automatically obtain the anomalous scatterer parameters from this type of solution file and include them in the MADSYS input script.

The Scattering_Element parameter is connected to a value-aid which contains atomic element codes for most of the atoms commonly used for making derivative crystals. It is not absolutely critical that the correct element type is selected, although using the correct element type may lead to more accurate heavy atom site determination from isomorphous derivative data.

The Low_Res_Cutoff and High_Res_Cutoff parameters are the resolution limits of the data that will be used in the heavy atom search. The heavy atom search program will run much faster and is sometimes more successful if the high resolution limit of the data is restricted to about five Angstroms. The Search_Limits are the range in fractional units of the crystal cell that will be searched for heavy atom sites. The heavy atom search program will finish more quickly if only the minimal volume that is needed to contain the crystal asymmetric unit is searched. The minimal search ranges that you may set will depend on the space-group symmetry of the crystal.

You may use the Search_Type parameter to control the extent of the heavy atom search. For example, if a preliminary examination of the difference Patterson function indicated that there were only one or two bound atoms, the Search_Type parameter should be set to the Single_Site or Double_Site option so as to terminate the search after the first one or two heavy atom sites have been located. If the Search_Type parameter is set to the Multi_Site option then the search algorithm will continue to attempt to find additional heavy atom sites after the first two sites have been located.

The Min_Search_Cor_On_I is a threshold parameter that controls the number of potential single sites that are picked up on the initial pass of the program (see the Theory chapter for information on the search algorithm). This parameter is the minimal correlation between observed and predicted difference Patterson coefficients for a potential site to be retained by the search program. Tests indicate that it is safest to use a relatively low value (for example, 0.04) of this parameter for multi-site searches and for searches using data for which the differences between native and derivative data are weak. The amount of CPU time used by the program will be somewhat reduced if a higher value for the Min_Search_Cor_On_I is used. A larger value (perhaps 0.1 - 0.2) may be sufficient for reliable single- or two-site searches if the differences between the native and derivative data are very significant and large peaks in the difference Patterson function were observed.

When you click on Execute the input parameters for the heavy atom search program will all be set and the Heavy_Operation option will move to the Run menu. You should select the required MIR_Run_Mode option depending on whether you wish to run the program immediately (Run_Now) or save the script to run later (Save_Script_Only). Run times for the heavy atom search program typically range from a few minutes to about an hour and depend mainly on the size of the crystal volume to be searched and the maximum resolution of the data. After the heavy atom search program run is complete, search diagnostics will be found in an output log file called find_heavy.log.

Once the heavy atom search is complete the resulting set of heavy atom sites should be checked for consistency with the difference Patterson function using one of the visualization methods available in Xsight. In particular, for multi-site heavy atom searches the search program will usually terminate after finding all of the correct sites but there is a possibility that additional false sites, lower down the heavy atom list, will be identified. These sites may be deleted if comparisons with the difference Patterson map indicate no corresponding density, and if trial occupancy refinements indicate a tendency for their occupancy parameters to shrink to very low relative values.

Refinement of heavy atom sites and phase determination

The refinement of the heavy atom sites and calculation of phase angles is carried out by the MIR/Calc_phases command. Use the value-aid to enter a Solution_File, which contains the positions of the heavy atoms in the crystal. Also enter the name for the output .phs file in the Phase_Output parameter block. When you select Execute, the Xheavy menu is spawned.

The Xheavy program allows you to set the element types for the heavy atoms and change the resolution limits of the data. The method of refinement is to carry out a correlation search on the positional parameters for the heavy atoms. The individual atomic occupancies and global non-isomorphism parameters are also refined during this optimization process. If the Refine B's checkbox is toggled on, the individual atomic anisotropic temperature factors will also be included in the refinement process. Depending on the maximum resolution of the data and the number of parameters to be refined, the refinement will usually take less than five minutes. When the refinement is complete, use the Save_Derivative button to update the atomic parameters.

With MB3, hold the cursor next to Method and select the Calculate Protein Phases option. When you click Apply, Xsight begins to calculate a set of phases based on this derivative.


Molecule replacement

This section describes a general approach to X-ray crystal structure determination by the method of molecular replacement provided in Xsight. The Mol_Replacement pulldown within the Insight II program implements the suite of programs for molecular replacement described by Tong (1993). It contains a program, GLRF, for general and locked rotation function calculations including self-rotation and cross-rotation functions, and a program, TF for translation function calculations based on R-factors, correlation coefficient searches, Patterson correlation, and electron density matching. Routines are available for calculating structure factors from a model, for calculating packing functions, for contouring maps, and for manipulating atomic coordinates and rotation matrices.

Self-rotation function

The self-rotation function determines whether there is non-crystallographic symmetry (that is, multiple protein subunits with similar conformations) in the asymmetric unit of an unknown crystal. Running a self-rotation function requires only a set of structure factor data.

In a self-rotation search, the map values are scaled such that matching the Patterson map onto itself without any rotation gives a height of 1000. This puts the rotation map values on an "absolute" scale, so that you can compare results from different runs of the program. For cross-rotation searches, a rough scale factor is estimated before the search is carried out. After the search, the maximum value of the map is re-scaled to 1000. In both cases, the maximum, minimum, average, and standard deviation from the average are reported by the program.

To run a self-function search go to the Mol_Replacement/Rotation_Func command. Then:

1. Begin in the Setup operation. A default Title and Rot_Job_Name is loaded but may be changed. Set the Rotation_Func_Type to Self_Rotation. The default Search_Angle for a self-rotation function is Polar. When you run a rotation search in polar angle coordinates (the most convenient angle system for self-rotation functions), the Fast_Rotation option should be turned off. Then select Execute.
2. You will now be in the Crystal_A operation. For a self-rotation function you should use From_Crystal_File for the A_Crystal_Values parameter. This enters the correct Unit cell values and Space_Group for your crystal. Now use the value-aid to enter your structure factor data file in the Refl_File parameter block. The Sigma_Cutoff, Max_F and Min_F parameters are used to eliminate some data from the calculation--these may normally be left at the default values. Then select Execute. The Mol_Replacement/Rot_Crystal_B command is not used in a self-rotation function search, so go to the Search_Parameters operation.
3. In the Search_Parameters operation you should enter the beginning, ending, and incremental angles for the self-rotation search. The Auto_Search_Params option is only applicable for searches that use Euler angles and should be toggled off for a search in polar angles. The resolution limits for the data used in the search (parameters D_Max and D_Min) may be changed from the default values if required. When these parameters are set, select Execute.
4. In the Run operation you should toggle on the Run_Contour option to generate a PostScript file of the contoured map sections. When you select Execute you will create an input parameter file for the GLRF program. If the Run_Rot_Now is toggled on then the self-rotation function will be run immediately. A conventional self-rotation function requires from several minutes to several hours to run.

Structure factor calculations

Before you run a cross-rotation function or a locked cross-rotation function you need to calculate a set of structure factors from your search model. Typically you will create this set of structure factors by placing the search model in a large unit cell with triclinic symmetry.

The calculation of the model structure factors is carried out with the translation function program (TF) using the Mol_Replacement/Search_Model command. Default parameters are supplied for the Search_Job_Name and Title. You will normally set the Crystal_Values parameter to Triclinic_Box. The box size is usually chosen to be at least twice as large in every dimension as the protein structure. The Model_File_Format that is used for importing the trial model may either be a Protein Data Bank file (PDB) or an Insight II Molecule.

The Temp_Factor_Modify options are available for changing the temperature factors for the atoms in the trial atomic model. You may use the Replace_All option to apply an overall temperature factor to the model. It may be that your trial model was obtained from a homology model-building program and some atoms were created with zero temperature factors. The Replace_0_Values option is available to reset the temperature factors for these atoms.

The Resolution Limits parameters control the resolution range over which you will calculate structure factors from the model. For a cross-rotation function it is usually best to leave out the very lowest resolution data (which are strongly influenced by bulk solvent scattering) and the very highest resolution data (which will be sensitive to small conformation differences between the trial model and the true structure). The default resolution range may be tried in an initial cross-rotation function. When the Direct_Summation option is toggled off, the structure factors will be calculated by FFT methods. This is very much faster than the calculation by direct summation methods and should be used for most applications.

If the Run_Search_Now option is toggled on the calculation will proceed as soon as you select Execute. The calculation normally takes less than one minute to complete and will result in a .fin file of calculated structure factors with a name based on the Search_Job_Name.

Cross-rotation function

The purpose of the cross-rotation function is to find the orientation of a trial atomic model in an unknown crystal.

You may run the cross-rotation search in the absence of non-crystallographic symmetry constraints or with the inclusion of known symmetry in a locked function mode. To run a standard cross-function search go to the Mol_Replacement/Rotation_Func command. Then:

1.   In the Setup operation set the Rotation_Func_Type to Cross_ Rotation. It will usually be best to work with the Search_Angle set to Euler. To use the fast rotation function toggle on the Fast_ Rotation option. For the fast rotation function the search will be carried out in Euler angles using the zyz convention. The fast rotation function is much quicker than the conventional "slow" rotation function but is somewhat less discriminating. Then select Execute.

2.   In the Rot_Crystal_A operation you will usually set the A_ Crystal_Values parameter to Triclinic_Box. This will set the Unit Cell and Space_Group values for the search model. The convention used by the GLRF program is that crystal A contains the search model and crystal B contains the unknown crystal. You should make sure that you use the same cell parameters as were used in the Mol_Replacement/Search_ Model command. With the Refl_File value-aid load the .fin file containing the calculated structure factors. The Max_F and Min_F parameters control the selection of reflections for the calculation and may normally be left at the default values. Then select Execute.

3.   In the Crystal_B operation you should set the B_Crystal_Values parameter to From_Crystal_File. This will set the Unit Cell and Space_Group values for your unknown crystal. Use the Refl_File value-aid to enter your structure factor data file. The Sigma_Cutoff, Max_F and Min_F parameters control the selection of structure factor data. The Large_Term_Cutoff will normally be set to ~0.2 for a fast rotation function and ~1.5 for a slow rotation function. Then select Execute.

4.   In the Search_Parameters operation you can use the Auto_ Search_Params option to automatically set the search range over a unique volume of Euler angular space. For an initial cross-rotation search over the whole angular space you can try using a relatively coarse grid (parameters T1_inc, T2_inc and T3_inc for the three search angles). The accuracy of the solution may be improved later by re-running the search on a finer grid around the solutions or by using the Refine operation. You can use the Resolution Limits parameters to change the resolution limits of the data used in the cross-rotation search. The Integration_Radius controls the range of vectors considered in the rotation function. This value should be set smaller than the diameter of the molecule. Then select Execute.

5.   In the Run command you can generate a PostScript file of contoured map sections by toggling Run_Contour to on (see Map contouring for further information). You may also wish to set the Output_Map_File option on to save the search map for the 3- dimensional contouring options in the Analysis command. Select Execute to create the input parameter file for GLRF. If the Run_Rot_Now option was selected the program will run immediately. Conventional cross functions require several minutes to several hours to run whereas a fast rotation function will normally be completed within 1-2 minutes.

Translation function

Translation functions are used to position the trial atomic model in the unknown crystal once the orientation has been obtained from the rotation function.

To run a translation function search, select the Mol_Replacement/Translation_Func command. Then:

1.   In the Setup operation you can change the Tran_Job_Name and Title parameters. The translation functions that are supported through the interface are R_Factor_on_F, R_Factor_on_ I, Correlation_on_F, Correlation_on_I, and Patterson_Cor. Toggle on the type of translation function that you wish to run and select the angle system for orienting the trial model. Then select Execute.

2.   In the Input_Data operation use the Refl_File value-aid to enter the file name for your set of structure factor data. The Sigma_Cutoff, Min_F and Max_F parameters are available to limit the data that will be used in the translation search--the default parameters can normally be used. Then select Execute.

3.   In the Model operation you will enter the coordinate file containing your trial model. The Model_File_Format for your model may be a Protein Data Bank file (PDB) or an Insight II Molecule. Select the Coord_Center option to center the molecule at (0.,0.,0.). To set the orientation of the molecule use the Coord_Rot_Type parameters. You must choose the angle type to match the output from the cross-rotation function. Set the Coord_Position to Moving to carry out a translation search with this model. You can enter multiple models if the position and orientation of one or more molecules in the asymmetric unit is already known, but only one molecule may be active (Moving) during the search. You can use the Mol_Size parameter to eliminate the close approach of model atoms within the specified radius in the search model. A negative value disables this parameter. By using Mol_Size you will avoid searching volumes of space where molecules in the crystal overlap. When your model(s) have been entered toggle the End_Models option on and select Execute.

4.   In the Search_Parameters operation you can enter the beginning, ending, and incremental limits for the translation search in fractions of the cell edge. Also under this operation are the resolution limits for the data to be used in the translation search. When you have set these parameters select Execute.

5.   The Run operation provides an Output_Map_File option to produce a map file and a Run_Contour option for producing PostScript contoured sections of the translation function. Toggle this options on to obtain these output files. The Direct_ Summation option will normally be toggled off, in which case structure factor calculations will be carried out by the much faster FFT methods. Select Execute to create the input parameter file for the TF program. If the Run_Tran_Now option was toggled on, the translation search will begin immediately.

Locked rotation function

The GLRF program has the facility for running the rotation function search in locked mode, in which known local (non-crystallographic) symmetry elements are input and used in the cross-rotation search. When non-crystallographic symmetry is available the signal-to-noise ratio of the rotation function increases by the square root of the number of protein copies.

To specify the local symmetry use the Local_Symmetry operation in the Mol_Replacement/Rotation_Func command. The symmetry elements are specified by Polar or Euler angles, cartesian coordinates of the end point of a vector from the origin, or as direction cosines of the vector. Note that the angle conventions selected above in the self-rotation or cross-rotation function search setup still apply to the local symmetry. You can enter a maximum of two local symmetry elements with the menu interface. To specify the local symmetry:

1.   Select the Local_Symmetry command and toggle on Specify_ Local_Symm.

2.   Specify the type of input under Symmetry_Type and specify a non-crystallographic symmetry axis by entering either: three rotation angles, the Cartesian coordinates of the end point of a vector, or the direction cosine values. You can choose the amount of rotation about the unique axis under Angle_Spec, for all rotations except polar angle rotations. You can enter the rotation amount as either a rotation in degrees about the specified axis, or as a divisor of 360° to define an N-fold rotation (that is, N = 2 for a two-fold rotation, 360/2 = 180°). The angle convention must match that chosen in the rotation function setup operation.

3.   If necessary, select Local_Symmetry_2 and enter the values for a second axis.

4.   Select Symmetry_Expand to generate all local symmetry elements (you should not choose this for improper rotations).

Fast rotation function

To invoke the fast rotation function, toggle on Fast_Rotation in the Setup operation of the Mol_Replacement/Rotation_Func command. The fast rotation function can be used for the ordinary self-rotation and cross-rotation function searches, but not for the locked search. In the fast rotation function, you must calculate the search in Euler angles and set the angle convention to ZYZ.

Refinement of rotation function solutions

The GLRF program contains a facility for automatically carrying out rotation function searches on a fine grid around the possible solutions. The program is also able to automatically carry out Patterson correlation refinement calculations on a set of possible solutions. In the implementation of the Patterson Correlation refinement it is possible to split up the model into groups of atoms with independent rotational degrees of freedom. This option may be useful for multi-domain molecules in which the relative orientations of the domains are different in the search model and the unknown crystal. These automatic refinement options are found in the Refinement operation within the Mol_Replacement/Rotation_Func command.

If you toggle on the Auto_Fine_Search option you will automatically carry out fine grid searches (1-degree increments) using the slow rotation function around a set of peaks found in your initial search. The Search_N_Peaks parameter sets the number of top solutions that you wish to investigate with the fine grid search and the Search_Cutoff is the large term cutoff for the slow rotation function.

The PC_Refine_Params option sets up the number of rotation function peaks, number of cycles and large term cutoff for carrying out Patterson Correlation refinement. Additional options for specifying separate groups are found under the PC_Refine_Groups option. Atomic groups can be selected with the Start_Residue and Stop_Residue parameters using residue numbers. In order to include chain identifiers in the group specification, you can use, for example, the syntax A:1 to identify residue 1 of chain A.

Map contouring

Map contouring is enabled within the Run operations of both the Mol_Replacement/Rotation_Func and the Mol_Replacement/Translation_Func commands. You must toggle on the Run_Contour parameter before setting the contour levels. If a map file does not exist from a previous run, you must generate one by running a rotation or translation search on the map sections of interest.

Visualization of rotation and translation functions

Both Mol_Replacement/Rotation_Func and Mol_Replacement/Translation_Func commands contain Analyze operations. In the Analyze operation you can enter the rotation or translation function map and the print file from the run. The result is a three-dimensional representation of the search function and a table containing the top peaks. By clicking the mouse on entries in the table of peaks you will be able to locate the various peaks in the three-dimensional representation of the rotation or translation function. The three-dimensional visualization is useful for determining if a possible solution of the rotation or translation function is really a distinct maximum or if it is simply a local maximum in a large region of search space where the function value is high.

Generating the structure solution

To write out the coordinates for the correctly placed model structure, use the Mol_Replacement/Apply_Transform command. This command reads the model coordinates used in the search, applies the rotation and translation results, and writes out the coordinates for the structure solution. Note that the definition of angles and conventions must be consistent with those used in the rotation and translation searches.


Density modification

The Modify_Density pulldown contains tools for improving a set of approximate phase angles using density modification techniques. Available density modification procedures include solvent-flattening, histogram matching, and the averaging of densities over sub-units related by non-crystallographic symmetry. An automatic method is available for determining protein/solvent envelopes from the initial electron density maps. Options are also provided for interactively checking and editing these envelopes. We have provided an extensive set of tools for defining the relation between non-crystallographically equivalent units and for generating, checking, and editing the envelope that defines the unique molecular unit.

Determining a protein mask

To automatically determine a molecular envelope that defines the protein volume select the Modify_Density/Auto_Density_Mask command.

Use the value-aid to enter the Phase_File for your structure. You will need to provide names for the output Map_File and Density_Mask_File. The Lower_Resolution and Upper_Resolution parameters set the resolution limits for the structure factor data that will be used in the calculations. It is usually best to include as much low resolution data as is available but not to extend the high resolution cutoff for the data beyond the point where you have useful phase information. Set the Solvent_Fraction parameter to the fraction of the unit cell volume that you expect to be filled with solvent. (This value is usually estimated from the crystal cell volume and the molecular weight of the protein.) Some crystallographers like to err on the side of underestimating the solvent volume to reduce the risk of generating a protein/solvent envelope that cuts into the protein. The Smoothing_Radius parameter controls the volume over which the program evaluates the density variance to determine if a grid point belongs to the protein or the solvent. A value approximately three times the Upper_Resolution limit will often be good.

Click the Execute button to begin the calculation. This calculation will normally take just a few minutes.

Checking and editing a protein mask

To check and, if necessary, edit the protein mask select the Modify_Density/Edit_Density_Mask command.

When you first enter this command you will be in the Read_Files operation. Use the Map_File_List value-aid to enter the filenames for both the electron density map and the protein mask. Then select Execute. Once the files are read you will automatically move to the Set_Slab operation. In the default setting, red contours mark the protein density, white contours mark the protein/solvent interface as defined by the mask, and red and white dots are used to aid demarcation of the protein and solvent regions. The default contour level is the root mean square density fluctuation of the map. Under Slab_Direction you can change the orientation of the slab of density that you are viewing. The density can be viewed in the XY, YZ, XZ or a Skew plane. The Slab_Operation options allow you to change the dimensions and position of the density slab. In many cases the default setting in which a slab of the entire unit cell is viewed down the XY plane will be satisfactory. Then select Execute. You will automatically move to the Edit_Mask operation.

The Edit_Mask operation contains tools for editing the protein mask. The overall aim of the editing process is to remove any small "solvent" holes from the protein region of the mask and to remove small isolated "protein" regions from the solvent region. The Filter_Mask option will automatically remove small defects over the entire mask by expanding and contracting the mask by the number of grid points specified by the Num_Filter_Steps parameter. The Undo_Filter option provides a way of reverting to the original mask if the results of this automated filtering are unsatisfactory.

For more specific mask editing a set of manual editing tools are available. When the Add_Point operation is on, you can use the mouse to draw polygons around any protein or solvent features in the mask that you wish to alter. To close a polygon, pick the first polygon point or select the Close_Polygon option. When you select the Mark_As_Protein or the Mark_As_Solvent options you will change the description of the mask region inside the polygon to protein or solvent. When the mask is altered in one region, crystallographically equivalent regions are automatically updated. After editing a feature the polygon outline is retained until you select the Reset_Polygon options. By using the Step_Forward and Step_Backward options you can work through the entire map, checking and editing any unsatisfactory features in the protein mask.

When you have finished editing the mask, go to the Save_Mask operation to save your work. Enter a Density_Mask_Output filename and select Execute to write out the file containing the mask. Finally, go to the Quit_Command operation and select Execute to leave the Modify_Density/Edit_Density_Mask command.

Solvent flattening and histogram matching

Once you have obtained a satisfactory protein mask select the Modify_Density/Apply_Density_Mask command to carry out the density modification.

Use the value-aid to enter the name of your phase file in the Phase_Input parameter block. Enter a filename for the refined phase angles in the Phase_Output parameter block. Use the value-aid to add your mask file to the Density_Mask_File parameter block.

The Extend_Phases option should be toggled on only if you wish to extend your phase set to include unphased data at higher resolution. If this option is toggled on then you will need to enter the name of a .phs file that contains structure factor data extending to higher resolution as the Master_Data_File parameter. This file will usually contain structure factor amplitude information only, with figures of merit set to zero and initial phase angles set to nominal values of zero.

When you click Execute the Apply_Operation parameter will step to the Parameters operation and you will be provided with a menu in which to enter the parameters that will control the density modification calculations.

If the Extend_Phases option was toggled on you will use the Modify_Schedule options to design a schedule for the phase extension. Typically such a schedule will consist of extending the high resolution limit of the data that is used in the density modification in a sequential series of thin resolution shells. The schedule that you created is summarized in a table. When you have finished creating the density modification schedule select the End_Schedule option and click on Execute. For calculations including phase extension each of the parameters below the schedule handling options is applied to each stage in the schedule. Otherwise, these parameters are applied to the single pass density modification run.

The Lower_Resolution and Upper_Resolution parameters are the resolution limits for the data that you wish to use in the density modification process.

Under the Apply_Operation options you can select from three possible density modification methods. The Solvent_Flatten option eliminates features from the solvent region but leaves the protein volume unchanged. The Histogram_Match option changes the distribution of density values in the protein region to conform to an expected histogram of protein density values. This option also applies a histogram to the solvent area which attenuates but does not completely remove density variations from the solvent region. If the Both option is selected then solvent flattening and histogram matching are carried out consecutively in each cycle. You may wish to try the Both option in your first attempt at density modification.

The Number_of_cycles parameter controls the number of iterative refinement cycles to be carried out. The default value of four cycles should reach convergence for some problems.

The Solvent_Flip parameter may be used to flip the direction of density fluctuations in the solvent volume of the crystal. A value of 0.0 indicates that no solvent flipping will be carried out. The magnitude of this parameter scales the flipped solvent density fluctuation and will normally be less than unity.

When the density modification parameters are set you should click on Execute. The Apply_Operation will now be set the Run option and the Modify_Run_Mode may be used to select whether to run the density modification process immediately (Run_Now) or save the script for use later (Save_Script_Only). If the Run_Now option is toggled on the density modification procedure will start when you select Execute. For most problems the density modification procedure will be complete in a very few minutes.

When the density modification procedure has finished you may set the Apply_Operation option to Analyze to obtain a statistical summary of the density modification run from the log file. Relevent information on the progress of the density modification will be reported in a log file. You should also use the Xfit program (obtained from the Model_Building/Density_Fitting command) to examine in detail the electron density map computed from the new phases. The new map should be more interpretable than the map computed from the original phase set. It is important to note that the statistics produced by the density modification programs describe convergence to some point in parameter space more accurately than correctness--you should always examine these maps by eye to determine if the density modification produced a useful improvement in map quality.

Non-crystallographic symmetry

Procedures for defining non-crystallographic symmetry relations, defining the mask that contains a unique copy of the molecular subunit, and averaging density related non-crystallographic symmetry are contained in the Modify_Density/Set_NonCryst_Symm, Modify_Density/Set_Subunit_Mask and Modify_Density/Average_NCS_Density commands respectively.

Defining the non-crystallographic symmetry relations

In the Setup operation you will read in a file containing your electron density map. The Use_Partial_Model option is available if you intend to use an atomic model to define the non-crystallographic symmetry operation by using the relationships between symmetry-equivalent atoms. If the Use_Partial_Model option is toggled on you may use the Get_Molecule option to enter the Protein Data Bank file containing the atomic model into the Insight II system. If you have already done some work towards creating a mask that defines a unique non-crystallographic subunit in a previous session, you can be read this mask using the Read_NCS_Mask_File command.

After selecting Execute you automatically move to the Set_Slab operation. This operation is very similar to the Modify_Density/Edit_Density_Mask command and allows you to choose the volume and position of the slab of density that you wish to view. When you select Execute the parameters that control the slab size and position will be set and the NCS_Operation parameter will step to the Specify_Transforms operation. You will now define the non-crystallographic relations between the equivalent molecular sub-units.

The Specification_Type menu in the Specify_Transforms operation provides four options for defining the spatial relations between sub-units related by the non-crystallographic symmetry operators. (i) If you have previously created a file describing the non-crystallographic symmetry you can use the Read_NCS_File option to read in this file. (ii) If you have already obtained the non-crystallographic symmetry matrices from some external program, the Edit_Matrix option allows you to directly type in these matrices. (iii) If you are able to build a partial atomic model that contains a few equivalent atoms from each of the non-crystallographically equivalent sub-units you can use the atomic coordinates to determine the non-crystallographic symmetry matrices with the Partial_Model option. When you use this option you are provided with the NCS_Unit_Fragment parameter block in which to enter the atom specification for the atoms in the unique sub-unit and the NCS_Target_Fragment parameter block in which to enter an atom specification for atoms in a sub-unit related by non-crystallographic symmetry. Suppose, for example, that you are working with a partial model with Insight II molecule name "CCP" and that you want to generate a non-crystallographic symmetry matrix by overlapping the CA atoms in residues 100-120 with the CA atoms in residues 1-20. Use the syntax CCP:1-20:CA to enter the unique set of atoms in the NCS_Unit_Fragment parameter block and the syntax CCP:100-120:CA to enter the related group of atoms in the NCS_Target_Fragment parameter block. When you select Execute, a non-crystallographic symmetry matrix will be generated by superimposing the two sets of atoms. To generate additional symmetry matrices you need to enter additional groups of atoms in the NCS_Target_Fragment parameter block. (iv) If you wish to determine the symmetry relations by direct inspection of the densities, we have provided NCS_Unit_Marker and Target_Marker commands. These commands allow you to place markers in parts of the map that are related by non-crystallographic symmetry. It is often the case that the heavy atom sites in the derivative crystal will obey the same non-crystallographic symmetry as the protein sub-units. These sites may serve as a useful guide as to where to place the markers.

The Add, Update, and Delete options in the Transform_Operation menu allow you to manipulate the storage of the non-crystallographic symmetry matrices. In the majority of cases it will be possible (and useful) to define a rotational symmetry axis relating the copies of the protein sub-units to each other. The Define_N_Fold command is available for obtaining this information and will draw the N-fold axis on the map. When you have defined the relations between the non-crystallographic symmetry units select Execute.

You can now use the Save_Transforms operation to write the non-crystallographic symmetry relationships to a file. Files containing information on non-crystallographic symmetry are usually identified with the extension .ncs.

Once you have defined the non-crystallographic symmetry you may wish to go back to the Set_Slab command (see previous section) to view the density in a skew plane to carry out a visual check on the result.

If you wish to terminate your work on density averaging at this point you should use the Quit_Command operation to leave the Modify_Density pulldown. If you intend to continue directly to the Modify_Density/Set_Subunit_Mask command, where you will establish the mask that defines the unique molecular sub-unit do not use Quit_Command.

Setting up the protein subunit envelope

Once the non-crystallographic symmetry operations have been specified, the Modify_Density/Set_Subunit_Mask command provides tools for generating an envelope that contains the unique molecular subunit. When you enter this command the Subunit_Operation parameter is initially set to the Setup operation where you will establish the files and information that you will use for defining the molecular subunit.

If you did not use the Quit_Command option in the Modify_Density/Set_NonCryst_Symm command, you may toggle on the Use_Current_Map and Use_Current_Trans options to use the map and NCS transformations that were just established. If you left the Modify_Density/Set_NonCryst_Symm command using the Quit_Command then you will need to read the NCS relations from either the .ncs file containing the information (NCS_Transform_File) or from the crystal file (Use_Crystal_NCS). If you wish to use a partial model in order to define the NCS sub-unit mask, the Use_Partial_Model option should be toggled on. Once all of these files and information are established, click on Execute and you will be stepped to the Set_Slab operation.

The Set_Slab operation in the Modify_Density/Set_Subunit_Mask command is identical to the Set_Slab operation in the Modify_Density/Set_NonCryst_Symm command and is used to establish the view direction and amount of density that will be displayed. Once the parameters for this operation are set select Execute and you will be stepped to the Edit_NCS_Unit operation where you will create the envelope around the unique molecular sub-unit.

The simplest way to set up an initial envelope is to build a rough atomic model of a single sub-unit using the Xfit program in the Model_Building/Density_Fitting command. This model may be extremely crude--the important thing is that the atoms fill up most of the volume that you believe to be contained in the unique sub-unit. The Mask_From_Model option in the NCS_ Edit_Operation menu is used to identify all grid points within the Mask_Radius of these atoms as belonging to the unique protein unit. The value of the Mask_Radius parameter will depend on the level of detail of the atomic model--if your model is only a chain trace of CA atoms then a value of ~4 Å might be needed to fill up the subunit volume. For an all-atom model a radius of 1.8 angstroms would be appropriate. It would even be possible to use a very small number of atoms with very large radii to fill out the approximate space occupied by the protein.

The remaining Edit_NCS_Unit options work in a very similar way to the Modify_Density/Edit_Density_Mask command and can be used to edit the mask that defines the unique sub-unit. If you did not generate your initial mask from an atomic model then you can use these options to mark out the complete sub-unit boundary. If you intend to draw out the mask ab initio you will normally first wish to return to the Set_Slab operation to orient the density so that you are viewing the map down the skew plane perpendicular to the N-fold axis relating the non-crystallographic sub-units.

The Apply_NCS_Sym and Apply_Cryst_Sym operations are provided to generate and delete copies of the unique subunit that are related by non-crystallographic and crystallographic symmetry. We have chosen to color the envelope around the unique sub-unit white and to use blue to identify the copies related by non-crystallographic symmetry. When the crystallographic symmetry is applied, the copies of the unique subunit are colored gray and the copies of the non-crystallographic replicates are colored cyan. Areas of overlap between protein envelopes are flagged by yellow contours. The best way to work may be to initially leave the crystallographic symmetry off and generate, one at a time, the sub-units related by non-crystallographic symmetry. For each sub-unit you should return to the Edit_NCS_Unit operation to correct the unique sub-unit envelope in volumes of overlap. Once these problems are corrected you may wish to generate copies related by crystallographic symmetry using the Apply_Cryst_Symm operation and correct any problem areas with the crystallographic contacts. When you have finished, use the Save_Mask operation to write out a file containing the mask.

To leave the Modify_Density/Set_Subunit_Mask command use the Quit_Command option.

Running the non-crystallographic symmetry averaging

To average densities over the non-crystallographic symmetry go to the Modify_Density/Average_NCS_Density command.

Use the Phase_Input value-aid to enter your phased structure factor file (.phs). In the Phase_Output parameter block you should supply a name for the phased reflection file after density averaging. Use the value-aids to enter filenames for the NCS_Mask_File and for the NCS_Transform_File. The program will refine the non-crystallographic symmetry matrices by a local density search before running the averaging. In the Refined_Transform parameter block enter the name for the file that will contain the refined non-crystallographic symmetry matrices.

If you wish to extend your phase set to higher resolution data, toggle on the Extend_Phases option. Phase extension by non-crystallographic symmetry averaging is often successful, particularly if the number of protein sub-units is large. If the phase extension option is on, you will need to enter the name of a .phs file that contains structure factor data extending to higher resolution as the Master_Data_File parameter. This file will usually contain structure factor amplitude information only, with figures of merit set to zero and initial phase angles set to nominal values of zero.

When all the filenames and options are set in the Setup operation, click on Execute to step to the Parameters operation. In the Parameters operation you will specify the parameters that control the averaging over non-crystallographically related densities.

If the Extend_Phases option was toggled on you may use the Modify_Schedule options to design a schedule for the phase extension. Typically such a schedule will consist of extending the high resolution limit of the data that is used in the density modification in a sequential series of thin resolution shells. The schedule that you created is summarized in a table. When you have finished creating the density modification schedule select the End_Schedule option and click Execute. For phase extension each of the parameters below the schedule handling options is applied to each stage in the schedule.

With the Phase_Combination option you can carry out phase combination or simply accept the new phases from the symmetry-averaged map. Unless you have extremely high symmetry, it is safer to carry out phase combination. There are two possible ways in which you can treat the density that is not contained by the sub-unit envelope or its replicates. You can either leave the external density as it is or flatten this density. The Solvent_Flatten option allows you to flatten the density that is not contained in the sub-unit envelope or its symmetry equivalents.

The Lower_Resolution and Upper_Resolution limits contain the resolution limits for the data that will be included in phase refinement calculations.

The Number_Avg_Cycles is the number of density averaging cycles that will be run.

The Refine_NCS_Matrix option is available for refining the non-crystallographic symmetry matrices before averaging the map (this is advisable). If this option is toggled on then the Refine_Box parameter is the search range, in angstroms, for the matrix refinement. If you are doing calculations involving phase extension it is normally advisable to refine this matrix at each resolution step.

When the density averaging parameters are set you should click on Execute. The Average_Operation will now be set. The Run option and the Modify_Run_Mode may be used to select whether to run the density modification process immediately (Run_Now) or save the script for use later (Save_Script_Only). If the Run_Now option is toggled on the density averaging procedure will start when you select Execute. For problems that do not involve phase extension, one complete cycle (including symmetry matrix refinement) of density averaging will not normally take more than a few minutes.

When the density averaging procedure has finished you may set the Average_Operation option to Analyze to obtain a statistical summary of the density modification run from the log file. Relevent information on the progress of the density modification will be reported in a log file. You should also use the Xfit program (obtained from the Model_Building/Density_Fitting command) to examine in detail the electron density map computed from the new phases. The new map should be more interpretable than the map computed from the original phase set. It is important to note that the statistics produced by the density modification programs describe convergence to some point in parameter space more accurately than correctness--you should always examine these maps by eye to determine if the density modification produced a useful improvement in map quality.


Model fitting: Xfit

Tasks that involve interactive model building and examination are carried out using the Xfit (McRee, 1992) program. Xfit is a component of both the Xsight module and the XtalView package. Unlike most of the commonly used model-building programs, Xfit contains built-in FFT routines for computing electron density maps from the observed structure factor data and the atomic model. This feature greatly speeds up and simplifies model building and the examination of electron density maps. However, it is also possible to compute maps from a set of previously phased structure factor data and to import pre-computed maps. The main interactive model-building tasks are described in some detail in Chapter 5, Tutorial.

Viewing models and electron density maps

Select the Model_Building/Density_Fitting command to enter the Xfit program. With the Pass_Files option toggled on (the default) you will most commonly use a value-aid to enter a Protein Data Bank coordinate file, up to two .phs files containing structure factor data or up to two map files containing precomputed electron density maps obtained through the MIR/Calc_Fourier command.

Although you need to enter a .phs file to make use of Xfit's built-in FFT routines, this file does not need to contain the correct phase values since the phases may be rapidly computed from the atomic model. If you only have a .fin file available, you can reformat it as a "fake" .phs file using the Phase(.Phs) output option in the Data_Control/Import_Data command, and then use this file as input for Xfit. When you select Execute you spawn the main Xfit menu, the graphics window, and a floating menu bar.

If you entered a .phs file in the Model_Building/Density_Fitting command the Xfit Fast Fourier menu will be spawned when you enter Xfit. This menu enables you to compute electron density maps from a set of phased reflections. By dragging down with MB3 on the Coefficients button you can select from a variety of commonly used map types. By clicking the Apply button you initiate the map calculation (usually lasting a few seconds) which then appears in the graphics window.

If you entered a "fake" .phs file when you entered Xfit, you need to calculate phase angles from your atomic model. To do this, click the SFCalc button in the main Xfit menu; the Structure Factors menu appears. If you click the Calculate All and Scale button, a set of calculated structure factor amplitudes and phases will be calculated. If the Auto FFT after calculation button is toggled to Yes, this calculation will be followed by an electron density map calculation. If the Auto FFT after calculation is toggled to No, the calculated structure factors will be stored for use with the Xfit Fast Fourier menu.

When electron density maps are calculated they are automatically contoured in levels of the map standard deviation. To change the contour levels, contour colors and the size of the volume that is to be contoured, click the Contours button in the main Xfit menu. The Contour Maps menu that is spawned will give you control over all of these parameters.

You will frequently want to view atomic fragments that are related by crystallographic symmetry to the unique molecule. The Symm Atoms button in the main Xfit menu provides options for generating or removing the symmetry related fragments from view.

To leave the Xfit program, click the Quit button in the main Xfit menu.

Graphics handling

Colors

Xfit loads the standard colors from the file $XTALVIEWHOME/data/colors.dat. However, Xfit first looks in the local directory for a copy of colors.dat. Therefore, you can override the default file by putting a modified copy of colors.dat in your local directory. This is normally done to change a color to something that looks better in hardcopy. For example, when generating a PostScript file for use in making slides, it is better to add a little white to the red shades and it is better to make the blue shades close to cyan.

PostScript coloring is done in terms of RGB triples, and the color for the foreground is taken from the colors.dat file. Depth cueing is done by multiplying the RGB values by a scale factor that ranges from x to 1.0. You can change the value of x in the Files/Plot Properties popup.

Plotting

You can easily plot a current display to a PostScript file or printer. Plotting is controlled from within the Files menu. To send a plot directly to a printer, enter the name of the plotter in the Plotter (file) text box. To save the plot to a PostScript file for future printing or previewing, enter the name of the plot file in the Plotter (file) text box.

You can modify the plot in a number of ways using the Properties menu. This menu is automatically activated by clicking the Plot button. You can choose to plot in color, in black and white, or in black and white with depth cueing. In the depth-cued mode, the width of the line indicates the distance from front to back, with the lines in back being thinner. To plot additional information as a second page, select the Print stat page option. This option adds to the plot a second page that contains the viewpoint and the names of the objects used, as well as the map levels.

If files are saved as color PostScript images, you can send them directly to a commercial slide service. The advantage of this procedure is that the exposure is always correct and the resolution is usually much higher than on the screen. The disadvantage is that the screen colors and the slide colors may differ significantly, especially for dark colors. In particular, blue shows up unusually dark on slides (it is better to use cyan).

If you save the PostScript file, you can easily edit it, if you understand the PostScript language. The file format is set up to allow easy editing. Each object is bracketed by the two commands:


BEGIN type # name

and:


END

where type, #, and name specify the object in question. You search for each object by looking for the string BEGIN. The most common field to edit is the line width. In an XtalView PostScript file the line width command is defined near the top of the file as:


/W {setlinewidth} def

Later in the file, any occurrence of /W indicates the line width for a line that was just specified. You can then modify the value associated with the /W command, if you want.

You can also change the gray level by modification of the setgray command. A gray level of 0.0 is black, a gray level of 1.0 is white and a value of 0.5 is gray.

The labels are located at the end of the file. It is common practice to move the labels around a bit to prevent them from overlapping with other objects. Objects in the file are drawn starting from the front of the file and going to the end. Thus an object located near the end of the file will overwrite an earlier one if they occupy the same space. You can change the order of the objects in the file in order to reverse the effect of overlap. Labels have a shadow drawn around them so that you can read them, even if they fall on an object of the same color.

Vu objects

Xfit is both a program for building protein models from electron density and a program for generating graphical views of protein complexes. These two functions can interfere with one another, so Xfit was designed to minimize the inherent conflict.

In the Xfit paradigm, the model is never changed for viewing; rather, vu objects are constructed and the model is hidden if desired. A vu object is simply a subset of the currently displayed image which can no longer be addressed by residue number or atom number. In other words, a vu object is simply a display, composed of a collection of vectors that make up a portion (or all) of the current model. Vu objects are created in the View menu display, while manipulation of models, maps, or vu objects is done with the Show menu.

There are three basic file types that Xfit can display: models, maps, and vu objects. In a sense, models are sacrosanct; they are either built from maps and thus are experimentally derived, or they are imported as known structures. Vu objects are generated from currently loaded models, and typically only contain a subset of the atoms in the model.

If you wish to generate a view of the model that contains only C atoms, then you can delete all of the other atoms in the model, and take the chance that they could accidentally overwrite the model file; or you can create a vu object which only contains C atoms and hide the model. Up to 200 vu objects can be displayed at one time, giving you great flexibility in your ability to create interesting displays.

It is useful to activate the Show menu when generating vu objects. The Show menu lists all of the current model, map, and vu files that have been loaded or created. Files that are currently displayed are toggled on; you can hide files by toggling them off. As each vu object is created, it appears as an entry under Vu Objects. The text string associated with each vu object indicates what is in the vu object, which model it was created from, the sequence number of the vu object, the range of residues for the vu object, and the name of the model file from which it was generated.


Model fitting: waters

The Model_Building/Waters command contains special tools tailored to the task of locating and fitting ordered water molecules in electron density maps.

The methodology behind the Model_Building/Waters command is to first select peaks from an electron density difference map that are significantly above background noise as putative water sites. These peak positions are compared to the positions of other atoms in the structure to check that they are neither too close nor too far from any other atom. Additional control over the selection process is available through an option that allows you to only retain waters that are sufficiently close to potential hydrogen bonding sites. You can rapidly check by eye the water sites that satisfy these crystallographic and geometric criteria and accept or reject the putative water molecule. The elimination of water molecules related by crystallographic symmetry and the operations of appending the new waters to the protein coordinate file are handled automatically. After leaving the Model_Building/Waters command you can immediately re-start the model refinement.

Obtaining a difference map

The positions of putative water molecules are obtained by finding peaks in a conventional Fourier difference map. To obtain this map, you need to first obtain a .phs file containing calculated structure factors. This file is automatically created when ProLSQ95 is run, after the final refinement cycle. The name of the .phs file is based on the name of the output coordinate file. If you ran refinement with the CNX program from the Xsight interface then a reflection file with extension .fob and a root identical to final output coordinate file will have been output at the endpoint of the refinement run. The Data_control/Import_Data command can then be used to convert this reflection file to the required `Phase' file format. To carry out this conversion the Input_File_Format should be set to hkl_F1_SIG1_FC_PHS and the FTN_Format_Spec should be given as (6x,3i5,6x,f10.3,17x,f10.3,/,25x,2f10.3). This format specification is needed to ensure that the data items in this particular type of CNX reflection file are parsed correctly.

Now, go to the MIR/Calc_Fourier command and set the Map Type to an Fo-Fc map. In Xsight the usual convention is to use the suffix .map for an output map file. When you select Execute a difference map will be calculated for use with the Model_Building/Waters command

The Model_Building/Waters command also provides the option of reading in a second map for visual comparison with the putative water sites. Many crystallographers may also wish to calculate a 2Fo-Fc map in the MIR/Calc_Fourier command and also compare this map against the list of possible water sites.

Automatic location of water molecules

Go to the Model_Building/Waters command. When you first enter the Model_Building/Waters command you will be in the Get_Molecule operation. Use the value-aid to select your coordinate file for the PDB_File parameter and select Execute to read in the coordinates. You will now be in the Read_Map operation. Use the value-aid to select the difference map that you calculated using the MIR/Calc_Fourier pulldown. The Read_Second_Map option is available for reading an additional electron density map. Then select Execute to read in the map files.

You now have entered the Automatic operation. In this operation you can select the crystallographic and geometric criteria that are to be applied for locating water molecules. The Peak_Threshold is the number of standard deviations the peak density must be above the difference map density fluctuation to be recorded as a potential water peak. The default value of three may be tried in a first pass. If you set this parameter to a larger value you will ensure that only the most significant peaks are located but risk missing real water molecules for which the density peaks are nearer to the noise level in the map. The best value for the Peak_Threshold parameter will depend on the resolution and accuracy of your data set and the degree to which the model has been refined. As a very approximate rule of thumb, you would not normally expect to have more than one ordered water molecule per amino acid in the final set of coordinates.

The Min_Dist_From_Mol and Max_Dist_From_Mol parameters specify the minimum and maximum distances from the molecule that are considered reasonable for ordered water molecules. The default values for these parameters recognize the fact that most ordered water molecules are in direct contact with the protein surface but also allow for some uncertainly in the position of the surface atoms. Ordered water molecules are almost always connected to the protein by hydrogen bonds.

The H_Bond_Test gives you the option of applying the additional criteria that water molecules must be close to a potential hydrogen bonding partner to the automatic check over the putative water sites. When the H_Bond_Test function is toggled on the Max_Hbond_Dist parameter controls the maximum allowed distance between the putative water molecule and a hydrogen bonding site on the protein.

When you select the Execute button a search for water molecules will be carried out according to these crystallographic and geometric criteria. It normally will only take about a minute to obtain the water sites.

Checking the water sites

Once the search for waters is completed you will move to the Review operation. The putative water sites are listed in a table, with the current site highlighted in yellow. The table is ordered according to the peak height in the difference map. The protein atomic structure and the density map around the current site will be displayed on the screen. The Display_Options options allow you to change the contour levels of the density maps and the styles and colors used in the graphics.

If you want to accept all the waters without carrying out any visual check, then toggle on the Accept_All option and select Execute. This option is only recommended if you applied stringent criteria in the automatic selection of water molecules. More usually you will wish to do a visual inspection of the water sites.

Until a decision is made regarding the status of a water molecule, the molecule is colored grey. In order to tentatively accept a water molecule toggle on the Accept option. When you do this the color of the water will change to purple and its status will be updated in the table. If you decide not to accept this water toggle on the Delete option. In this case the color of the water molecule will change to blue. You should note that decisions made regarding the status of water molecules are considered temporary until you are ready to end the session. To work through the entire list of putative water sites, either select the Next option or pick a site entry in the table.

The Adjust option allows you to change the position of any of the water molecules that were located. When you select the Adjust option, a slide-bar and parameter block containing the positions of the current water molecule will appear. Depending on whether you use the Water_X, Water_Y, or Water_Z parameter block, you can adjust the water molecule in any direction with the slide-bar. When you have finished adjusting the water position select Execute.

A common occurrence in interpreting difference maps, particularly at lower density levels, is that you may wish to re-interpret an elongated density feature as one rather than two water molecules. To add new waters to the existing list use the Add option. With this option you can use the slide-bar to move an existing water molecule to the site where you wish to add the new water. When you select Execute the new water will be added to the list but the original position of the existing water will still be retained.

If you decide that one of the putative water molecules in your structure is really an ion you can use the Change_To_Ion option to convert the water to one of the commonly occurring ion types. These ions will be automatically recognized by the CNX and ProLSQ95 refinement programs.

When you have finished working through the entire list of waters toggle on the Add_Waters_To_PDB option in the Water_Operation menu. You will have the option of supplying a filename for the output .pdb file or you can accept the default name. When you select Execute the accepted waters will be automatically appended to a copy of the input .pdb coordinate file. To leave the Model_Building/Waters command toggle on the Quit_Command operation and select Execute.


Refinement

Simulated annealing

Since simulated annealing refinement has a greater radius of convergence than least-squares minimization it is the method of choice for refining the positional parameters of a poor initial protein model. To carry out simulated annealing refinement within the Xsight package we have supplied an extensive interface to the refinement capabilities of the CNX program. This interface includes the ability to import stereochemical parameters for ligand molecules and to apply non-crystallographic symmetry restraints to the structure during the refinement. A powerful and flexible spreadsheet-style interface is available for setting up simulated annealing schedules. To use the simulated annealing refinement option with the CNX program you will use the Refinement/Setup_CNX, and Refinement/SA_Refine commands.

Setting up files for refinement with the CNX program

The first task that needs to be accomplished before running crystallographic refinement with the CNX program is to create a protein structure file (PSF) which contains information on the protein stereochemistry. At the same time you will create an CNX formatted coordinate file with hydrogens attached to the polar atoms. The Xsight system uses the Engh-Huber parameters to define the protein stereochemistry. The CNX formatted coordinate file is almost identical to the input Protein Data Bank coordinate file except that chain information is removed and a new segid identifier is included that is based on the chain code. Once these files have been created you may use them for all your refinement calculations with the CNX program provided that you do not add or remove any atoms from the structure.

The Refinement/Setup_CNX command scans the input coordinate file for atom names OT1 and OT2 and renames them to O and OXT, respectively, in accordance with the atom-naming scheme adopted in the topology file $MSI_CNX_TOPPAR/protein.top. Unlike the Setup_XPLOR command in previous versions of Xsight, the Setup_CNX command leaves the PDB records CHAINID, SEGID, and ALT (discrete disorder) intact.

You need to enter the name of your input Protein Data Bank coordinate file in the PDB_Input_File parameter. Filenames are automatically generated for the output PSF file in the PSF_Output_File parameter block and for the output CNX formatted coordinate file in PDBX_Output_File parameter block.

The Ligands step of the Generate_PSF_File operation provides the opportunity for entering stereochemical information files for any ligands in the structure. The Add option supplies you with a Ligand_Code parameter in which you need to specify the three-letter `amino-acid' code that is used for the ligand molecule in the co-ordinate file. The Ligand_Topol_File and the Ligand_Param_File parameters are used to enter the topology and parameter files for the ligand. These files could have been generated using the Xsight Refinement/Dictionary command or could have been obtained from some external source. When you have included stereochemical information for all of the ligands in your structure you should use the End_Ligands option to exit from the Ligands menu.

To run the CNX program for the Generate_PSF_File task, go to the Run step and click on Execute with the Run_Mode option set to Run_Now. This run usually takes only a few seconds except for structures containing many sub-units (i.e., water molecules or independent molecules in the asymmetric longer) when the run could take 2-3 minutes.

Running refinement calculations with the CNX program

The Refinement/SA_Refine command is available for carrying out simulated annealing and energy minimization refinements of atomic coordinates with the CNX program. All refinement calculations use the intensity-based maximum-likelihood target. In multistage refinements the maximum-likelihood parameters are updated at the start of each stage. The Babinet bulk solvent-scattering correction is also automatically applied, since this provides a physically more realistic representation of the total crystal scattering and leads to somewhat more convergent refinements. The Engh-Huber parameter set with a quartic nonbond potential provides the chemical energy term.

The first operation (Job_Control) that you encounter when you enter the SA_Refine command allows you to enter the input coordinate file, input PSF file, and input reflection file for the refinement. The reflection file may be supplied in any of the Xsight formats and is automatically converted to the CNX reflection file format. The output file containing the refined coordinates is also named in this operation. At the end of the refinement, a reflection file with extension .fob and root based on the output coordinate filename is automatically written.

If your structure contains noncrystallographic symmetry, you may want to restrain equivalent atoms to similar conformations using the NCS_Restraints operation. This option is particularly valuable when you have relatively low-resolution data, since it reduces the number of free parameters relative to the amount of diffraction data available. A series of trial refinements with cross-validation (the free R value) may be used to obtain the best value for NCS_Weight.

The selection of reflection data for the refinement is controlled by the Reflection_Data operation. For most refinements you should reserve about 10% of the data for free R value calculations (i.e., Free_R_Statistic should be on) to verify the convergence of the refinement. Since the maximum-likelihood target used for refinement with the CNX program has a strict requirement for a set of cross-validation data, the CNX program automatically reserves 10% of the data for cross-validation purposes if these data are not reserved through the interface. However, if the cross-validation data are already explicitly defined in the reflection file using the TEST tag (see the CNX documentation for more information on the CNX reflection file format), then the cross-validation data should not be reserved through the Xsight/SA_Refine interface. Since the Babinet bulk solvent-scattering correction is automatically applied during the calculation, it should be possible to include all the very low-resolution data in the refinement.

The Refine_Schedule operation provides a simple yet powerful interface for setting up a refinement schedule. The schedule is divided in to a series of stages, which consist of either cycles of least-squares minimizations or a period of molecular dynamics. To build a schedule, make sure that Schedule_Operation is set to Add_Stage. You may then select between Minimization, Dynamics, or Torsion_Dynamics for the Stage_Type. Depending on your choice, you will be presented with a menu of parameters for controlling the stage. When the parameters are set, click Execute to enter that stage into the schedule. As the schedule is built, the parameters are reported in a table. The simplest possible schedule would contain one stage of minimization. A typical simulated annealing schedule would contain an initial stage of minimization, a stage of dynamics, and a final stage of re-minimization.

To facilitate setup of the simulated annealing schedule, the file $BIOSYM/tutorial/xray/sa_schedule.inp is now provided with the program. It contains a canned simulated annealing protocol that can be imported by using the Refinement/SA_Refine/Refine_Schedule/Get_Schedule option.

Once you have set up the schedule, you may save it for future use using the Put_Schedule option under the Schedule_Operation options. This may be particularly useful if you want to experiment with values for the X-ray weights in the refinement. In particular, you may want to optimize the Weight_F parameter by carrying out several similar refinements with different values for Weight_F and using the free R factor to monitor the results. The initial estimate for the Weight_F parameter may have been obtained using the Find_Xray_Weight operation in the Setup_CNX command.

When you finish setting up the refinement schedule use the End_Schedule option to proceed to the Run command. The Run parameter block enables you to save the CNX script for future use (perhaps on a different and more powerful computer) if you do not want to submit the refinement job for immediate execution.

Single rigid-body refinement

In a rigid-body refinement the translational and rotational parameters of the protein as a whole are adjusted to improve the fit to the diffraction data. Rigid-body refinements are useful for improving the solution of a molecular replacement search (which determines rotation and translational parameters in sequential 3D searches) and in cases where a variant of a known protein crystal appears with slightly different cell dimensions. We have implemented a slightly modified version of the RotLSQ program (Hendrickson, unpublished program) for carrying out least-squares optimizations of protein rigid-body parameters.

Setting up a rigid-body refinement

Select the Refinement/Rigid_Body_Refinement command to set up a rigid body refinement. When you first enter this command you will be in the Job_Control operation. Use the value-aid to select your coordinate file for the PDB_Input_File parameter block. Use the value-aid to enter your reflection file for the Fin_Input_File parameter block. A default name for the PDB_Output_File will be supplied. The Number_of_cycles parameter controls the number of refinement cycles that will be run. About eight cycles are usually sufficient. Then select Execute to move to the Reflection_Data operation.

The Low_Res_Cutoff and High_Res_Cutoff parameters control the resolution limits of the data that will be included in the refinement. Rigid body refinement programs are able to correct large positional errors (greater than 1 angstrom) if carried out correctly. However, the presence of high resolution data will reduce the possibility of producing large positional shifts. As a rule of thumb, you should truncate the data at a high resolution limit that is about four times the expected positional shift. You may want to do an initial rigid-body refinement at relatively low resolution and then do a second run with higher resolution data included. The Minimum_Fobs and the Min_Fobs_over_Sigma parameters allow you to exclude small and unreliable data from the calculation; the default is to include all data.

If you want to attempt to use all the lowest resolution data in the refinement, set the Solvent_Scale and Solvent_Smooth parameters. These parameters provide a first approximation to modeling the effects of bulk solvent scattering. In the absence of this correction the calculated structure factors are much too large at very low resolution. By using the parameters it is often possible to compensate for bulk solvent scattering, allowing the lowest resolution data to be included in the refinement in a more reliable way. The expected value for Solvent_Scale is the average solvent density (which usually corresponds to the composition of the mother liquor for your crystal) divided by the average protein electron density. This value is usually between 0.8 and 1.0. Results are not very sensitive to the Solvent_Smooth parameter--a value of 400 is usually good.

After you select Execute you will automatically move to the Refine_Variables operation.

In the Refine_Variables operation you must choose whether to refine each parameter or freeze it at an initial value. The initial value for the scale factor between observed and calculated structure factors is given by the Scale_Factor parameter. The overall temperature factor for the model is given by the Overall_B_Factor parameter. By changing the Theta1, Theta2, and Theta3 Euler angle values you can rotate the starting position of the molecule. Similarly, by changing the Tx, Ty and Tz angstrom positional parameters you can translate the starting position of the molecule. Options are available for refining or freezing any of these parameters. It is often inappropriate to refine the overall temperature factor in these refinements because its value is highly coupled with the overall scale factor. In some space groups, translations along particular axes are undefined. The program will automatically freeze the refinement along these axes. After you select Execute you will automatically move to the Start_Job operation.

If the Run_Rigid_Now command is toggled on the job will start as soon as you select Execute. Printed output from the run is directed to an output file called rigid.log. The rigid-body refinement will usually take a few minutes to complete.

Convergence of the refinement can be checked using the cycle-to-cycle changes in R value and the magnitudes of the parameter shifts. A correlation matrix provides information on the coupling between the refineable parameters. You should note that the Euler angle system used for recording the parameter shifts contain a singularity when the shift in the second angle approach zero. Thus cases where the parameter shifts in the second and third Euler angle are large and have opposite signs do not imply large rotational shifts in the orientation of the protein.

Multiple rigid-body refinement

If you performed a molecular replacement calculation for a crystal with multiple proteins in the crystal asymmetric unit, or if the single protein has well-defined domains, you may wish to refine the atomic structure in terms of multiple rigid-bodies. The Xsight system employs the CNX program for multi-rigid-body refinement calculations.

The initial set up for multi-rigid-body refinement calculations uses the Refinement/Setup_CNX command as described in Setting up files for refinement with the CNX program. To run the multi-rigid-body refinement calculation you should then go to the Refinement/Multi_Body_Refine command.

The first operation (Job_Control) that you will encounter when you enter the Multi_Body_Refine command allows you to enter the input co-ordinate file, the input PSF file and the input reflection file. The reflection file may be supplied in any of the Xsight formats and will be automatically converted to the CNX reflection file format. The output file containing the refined co-ordinates is also specified in this operation.

In the next operation (Define_Groups) you will define the groups of atoms that are considered to comprise the rigid-bodies. A residue based system, in which you define a Start_Residue and Stop_Residue for each group is used to specify the atomic groups. To include the segid identifier in the residue specification use, for example, A20 to indicate residue 20 with segid A. While the Multibody_Groups option is set to Add you may continue to add groups by clicking Execute after each group is assigned. Change the Multibody_Groups option to End_Groups to finish entering groups.

The Reflection_Data and Refine_Variables operations contain parameters for selecting reflecting data and for controlling the refinement. It is usually best to restrict the upper resolution limit of the refinement to 3 or 4 angstroms to allow larger shifts and a more rapid convergence of the refinement.

The execution of the refinement is controlled by the Run operation. These refinements are often completed in less than one to two minutes so you will normally wish to run the refinement immediately. The results of the refinement may be extracted from the resulting log file and graphed using the Analyze operation.

Least-squares minimization

Once an approximately correct protein model has been obtained the positional and thermal parameters of all the protein atoms, water molecules, ions and ligands in the structure need to be included and refined. The preferred method for this final refinement is now maximum likelihood energy minimization with the CNX program. However, a version of Prof. Wayne Hendrickson's ProLSQ program (Hendrickson, 1985) that was developed at MSI--ProLSQ95--to carry out restrained least-squares refinement of protein structures is still available as part of the Xsight module. Salient features of ProLSQ95 are:

ProLSQ95 is available through an interface called Xprolsqtool. With this interface you can very rapidly set up and run refinement jobs and obtain direct summary of the progress of the refinement. For more information see the Xprolsqtool program heading.

To run Xprolsqtool, select the Refinement/Minimize command. Use the value-aid to enter your structure factor file in the Fin_File_1 parameter block and enter your current coordinate set in the PDB_File parameter block. The standard stereochemical dictionary is a file called ideals.dat, which may be found in $XTALVIEWHOME/data. Select Execute to spawn the Xprolsqtool interface.

Selecting reflection data for the refinement

When you first enter Xprolsqtool, the Reflection Data menu is available. At other times, click the Reflection Limits button to open the Reflection Data menu. This menu allows you to select the reflection data that you wish to include in the refinement. There are also options for requesting a subset of data for a free R value analysis and an option for applying a correction for bulk solvent scattering.

The scale factor between observed and calculated structure factors and the weight on the X-ray "energy" term in the function that is minimized will be determined automatically if you do not supply these items. The weight on the X-ray energy term is the key parameter for most refinements. The weighting scheme that is automatically supplied is most appropriate near the end-point of a refinement. If you have a high R value that is not significantly reduced on refinement, you can try reducing the value of AFSIG by a factor of up to about one-half and setting BFSIG to zero. By altering the weighting scheme in this way the refinement will tend to optimize the X-ray energy term and the result will be a lower R value. After running several refinement cycles with these adjusted weights you will probably need to increase the value of AFSIG again to improve the protein geometry.

If you do not intend to apply the bulk solvent correction you should eliminate the lowest resolution data from the refinement. In the Resolution Limits section of the menu it is usual to change the lower resolution limit to a value between 5 and 10 Å resolution. The Solvent Correction parameters are the same as described in Setting up a rigid-body refinement. When you click the Apply button the parameters in the Reflection Data menu will be set and the menu will disappear.

Selecting a refinement strategy

When you first enter Xprolsqtool, the Refinement Strategy menu will be available. At other times use MB3 to click the Refinement Parameters button and with MB1 click the Refinement Strategy button. This menu allows you to set up a strategy file containing a series of refinement cycles. The default strategy is to run five cycles of coordinate refinement followed by three cycles of temperature factor refinement followed by five cycles in which positional and temperature factors are jointly refined. This strategy is appropriate in the final stages of the refinement. For a model in the early stages of refinement you will only refine the positional parameters.

The Calculation Method option allows you to choose between FFT and Summation methods for calculating structure factors and gradients. The FFT method is usually one to two orders of magnitude faster than the summation method and should be used except for refinements at extremely high resolution.

Changing the stereochemical restraints

If you click with MB3 on the Refinement Parameters command, you can access the weighting sigmas for the various types of geometric restraints. Restraints on the protein covalent geometry are found under the Geometry Restraints menu. The default values will not normally need to be changed. Under the Non-bonded Contacts menu you will find the restraints against excessively close contacts between nonbonded atoms. These restraint values are not normally changed. The Thermal (B Values) menu allows you to control the variation in temperature factors between bonded atoms. The default values for the temperature refinement restraints are rather conservative and would be appropriate for a structure at medium resolution or the early stages of a high resolution refinement. Relaxation of these restraints may be possible for a high resolution structure but should be verified using cross validation tests (for example, the free R value test).

Running the refinement

To begin the refinement, click the Start Refinement button of the main Xprolsqtool menu. Most refinements will be fast enough to run interactively, taking less than one minute per cycle. However, the Create Shell File Only option is supplied in case you wish to write out the ProLSQ95 script and run the job later.

As the refinement proceeds, a summary of the run will appear in the textport window. Most refinement problems occur in the Protin program, which sets up the stereochemical restraint information for ProLSQ95. If you receive the "Protin failed" message, the View Protin Log File button in the Refinement Parameters menu is automatically activated. This button allows you to examine the protin log file to assist in diagnosing the problem. The most likely cause of the error is that the coordinate file contains misnamed atoms (atom names not consistent with the stereochemical dictionary) or is constructed in an irregular way. Strict adherence to the Brookhaven Protein Data Bank format and nomenclature is encouraged in order to minimize potential difficulties.

ProLSQ95 produces a number of output files. If the input .pdb file was called my_protein.pdb, then the output coordinate file after the first series of refinement cycles will be called my_protein.r1.pdb. A .phs file containing the final set of structure factors called my_protein.r1.phs will also be produced. Besides the information that appears in the textport, refinement statistics will appear in a file called my_protein.r1.hist. In addition to the output coordinate file, my_protein.r1.pdb, produced by Xprolsqtool, a copy of the output coordinates is also produced in a file called final.pdb. This file is useful if you are running the refinement as a background job, without Xprolsqtool.

Setting up stereochemical dictionaries for new ligands

Protein structures frequently contain ligands and co-factors that need to be refined along with the protein and water atoms. The ProLSQ95 restraint dictionary (usually named ideals.dat) that is supplied with Xsight contains entries for the commonly occurring amino acids, the heme group, water molecules, and a collection of commonly occurring ions. The CNX topology and parameter files that are suppled with Xsight also contains entries for the commonly occurring amino acids, water molecules, and a collection of commonly occurring ions.

Under the Refinement/Dictionary command we have provided tools for importing additional ligand structures, automatically setting up stereochemical restraint information, creating topology/parameter files for the CNX program and modifying the ProLSQ95 and Xfit stereochemical dictionaries to accommodate these ligands. A modified version of the CONEXN program (Pahler and Hendrickson 1990) is used to introduce the restraints into the ProLSQ95 dictionary. You should note two restrictions present in the Refinement/Dictionary command that have been made in order to simplify the support of all of these programs. First, atom names should not be longer than three characters long (a warning message and the Edit_Atom_Names option is provided to detect and correct this situation). Second, hydrogen atoms are removed from the ligand - this simplification is justified by the very approximate nature of the ligand parameters and the fact that hydrogens are frequently eliminated from crystallographic refinements.

Before using the Refinement/Dictionary command, you need to obtain an atomic model of the ligand that you wish to include in the restraint dictionaries. This model could come from the Insight II Builder, a small molecule database, or some other source. This model will be used to establish the ideal geometry of the ligand. Use the Insight II Molecule/Get command to read in the ligand. Since the Refinement/Dictionary uses hydrogen atoms for the automatic definition of chiral centers, you may wish to use the Insight II Biopolymer/Modify/Hydrogens command to add hydrogens to the ligand if they are not already present.

When you are setting up restraints for a new ligand the Refinement/Dictionary command will begin from the Automatic_Params operation. Select Execute to obtain an automatic definition of the stereochemical restraints. Groups of atoms in planes and chiral centers will be automatically defined. Planes are represented on the screen by a plane of criss-cross lines with the atoms in the plane enclosed in small boxes. Chiral centers are marked by enclosing the chiral atom in a small pyramid. By clicking on the Edit_Chiral_Centers or the Edit_Planes operations you will be able to change the atoms that are included in a plane or which were determined to be chiral centers. With the Edit_Torsions operation you can define a torsion angle by clicking on the four atoms involved in the torsion angle. The Edit_Atom_Codes operation allows you to manually enter code numbers for elements not normally found in proteins. The number of scattering electrons is used as the code for identifying the correct scattering factors for use with the CNX and ProLSQ95 programs. For users of the CNX program these specially set scattering factors are written to a small file which will be automatically picked-up and written into the CNX script by the CNX refinement interfaces in Xsight. If the Insight II Builder module was used to create the ligand then this option will not normally be needed for refinements with the CNX program. For ProLSQ95, the atomic element codes are incorporated in the stereochemical dictionary file and scattering factors are automatically assigned by the ProLSQ95 program.

The Update_Dictionary operation enables you to write the atomic parameters and restraint list to a dictionary file (.dct). This operation also allows you to update the stereochemical dictionaries for the ProLSQ95 and Xfit programs and to create topology/parameters files for the CNX program. The Dictionary_File has been designed to mimic a Protein Data Bank file with various keywords to specify the restraints. Once a dictionary file is written you will be able to import it back into the Refinement/Dictionary command using the Read_Dict_File operation. This mechanism may be useful if you are dealing with structures containing a series of very similar ligands. In this case it may be quicker to edit the dictionary file directly than to redetermine the restraints for each ligand. You will then only need to use the Update_Dictionary operation to add the ligand and its restraints to the ProLSQ95 and Xfit dictionaries and create topology/parameter files for the CNX program.


Checking the stereochemical quality of a structure

Once an atomic model is built and has been subjected to refinement you will wish to check the structure for regions of poor stereochemistry. If the stereochemical parameters of the model are significantly different from expected values then you should re-check the dubious regions against electron density omit maps to determine if that part of the model needs to be re-fitted to the density. To check the stereochemical quality of atomic models the ProStat command is available.

Before entering ProStat use the Insight II Molecule/Get command to read your molecule. Now go to the ProStat/Struct_Check command. By toggling on the Check_Bonds, Check_Angles, Check_Torsions options you can check the covalent bond lengths, angles between bonded atoms and torsion angles against standard values in the Huber-Engh stereochemical parameter set (Engh and Huber 1991). Within each of these options you can toggle further sub-options to check particular classes of parameter. The N_Std_Dev parameter controls the threshold for reporting the stereochemical parameter's difference from an ideal value. The threshold is measured in number of standard deviations from the ideal value.

Under Output_Method are a set of options for obtaining the results of the analysis. Options are List_To_Textport, Color_Molecule and Residue_Table. The List_To_Textport and Residue_Table options list numerical information on the ideal and actual values of the stereochemical parameter and the deviation from ideality. The Color_Molecule option allows you to color the irregular regions of structure on the display of the three-dimensional model.


Visualization of crystal symmetry

It is important to remember that the protein atomic models that we obtain represent molecules in the crystalline state. In two commonly occurring situations, visualization tools are needed to understand the crystal symmetry.

To handle these situations we have developed the Symmetry pulldown for the visualization of crystal symmetry.

Crystal packing diagrams

The Symmetry/Packing command is used to generate crystal packing diagrams showing many whole protein molecules in and around the crystal unit cell.

The Get_Molecule operation is used to read in the Protein Data Bank file containing the atomic co-ordinates and select the type of representation (i.e., all atoms, a backbone trace or a CA trace) for displaying the crystal contents. After you click Execute you will transfer to the Generate operation.

The Generate operation allows you to set the limits (in fractional coordinates) of the crystal unit cell that you wish to visualize. The default values will result in the display of all molecules in a single unit cell. If you are checking a molecular replacement solution you should select limits that will cover all possible contacts with your unique molecule. If the center of mass of a symmetry related copy of the molecule lies within the specified limits then the copy of the molecule will be displayed. If you enter limits less than zero or greater than one you will be able to view copies of the molecule in more than one crystal unit cell.

The Cell_Box command gives you the option of marking the vertices of the crystal cell. It is normally useful to have this option turned on. The Label_Replicates and Symmetry_Table operations provide methods for identifying the symmetry operation that corresponds to a copy in the display. Both these options can be toggled off if you are simply interested in checking the molecular packing in a molecular replacement solution. When you select Execute, the replicates within the given set of limits will appear on the screen.

If you are constructing a packing diagram for a publication you may wish to delete one or more of the replicates to improve the appearance of the display. The Delete operation allows you to select and delete individual copies of the molecule.

In order to improve display speed the symmetry related copies of the protein in the packing diagram are graphical objects rather than "real" molecules. If you wish to use some of Insight II's sophisticated rendering capabilities you will need to convert these graphical objects to "real" molecules. The Transform operation contains an Add_Molecule option that allows you to do this. After converting one of the crystallographically related replicates to a real molecule you may also wish to create an extended coordinate set that contains the coordinates of more than one molecule in the crystal. This function would be useful if, for example, the biologically interesting aggregate that you wished to study was a dimer but, in the crystal, the dimer was generated by a crystallographic symmetry operation. The Symmetry/Merge_Molecule command has been provided to accomplish this task.

Analyzing the atomic interactions between molecules

With the Symmetry/Contacts command you can examine the inter-molecular interactions between the unique protein molecule and amino acids with neighboring molecules in the crystal.

The Get_Molecule operation is used to read the Protein Data Bank file containing the atomic co-ordinates of the molecule. After you select Execute you will transfer to the Generate operation. You need only provide a Contact_Distance to obtain all crystallographically related amino acids that are within contact distance of the unique molecule.

The Show_Close_Contacts option provides a way of identifying excessively close contacts between adjacent molecules. The Label_Fragments and Symmetry_Table options provide a way of identifying a particular molecular fragment with the crystal symmetry operation which generated it. The crystallographically related fragments are stored as "real" atoms so that Insight II analysis tools can be used to investigate the contacts between the fragments and the unique molecule.



MSI Product Previous Next Contents Index Top

Last updated January 28, 2000 at 03:49PM Pacific Standard Time. Copyright © 2000, Molecular Simulations Inc. All rights reserved.