File Formats


Xmipp/Spider Format
Byte Format
Selection Files
Document Files
These are the file formats used in Xmipp. The byte format is used when digitizing micrographs, once the particles have been selected from the micrographs they are saved in Xmipp format (that is exactly the same as the Spider one). Hence, all images and volumes are kept in Spider format to mantain 100% compatibility with that package. The selection files are files telling you which images of a set are taking part into a process and which aren't. The document files are also inherited from Spider although we use an extension of that format. Basically they are used as textual files for interchanging numerical values between programs.
 


Xmipp/Spider Format

This is the most important file format used in Xmipp. Spider has created a "standard" file format that is also used by Xmipp. Each pixel in the image or each voxel in the volume is stored as a float and a header is added at the beginning of the file.

The sipder file is (cols x rows x slices) * 4 + header size bytes long .

 note:
   - 4 is used because a float number is represented by 4 bytes.
  - slices = 1 in case of 2D images.

The image header is a structure that holds important information about the image. Some of the header fields are not used by Xmipp, however, Xmipp preserves the integrity of thet information in order to make them compatible with Spider programs.  Xmipp library and progrmas treat the header internally and automatically to avoid direct interaction with this complicate structure.  (See imageXmipp and volumenXmipp classes)

Description of the header file:  (the fields are defined in C data types)
 
 

     Dimension information (Also used by Xmipp)

     /* 1*/  float fNslice;            // NUMBER OF SLICES (PLANES) IN VOLUME
                                                      // (=1 FOR AN IMAGE)
    /* 2 */  float fNrow;             // NUMBER OF ROWS PER SLICE (Rows Y)

    Internal information for a complicated calculation of file lenght.

    /* 3 */  float fNrec;              // TOTAL NUMBER OF RECORDS .
    /* 4 */  float fNlabel;           // AUXILIARY NUMBER TO COMPUTE TOTAL NUMBER OF RECS

    File type information (Also used by Xmipp)

    /* 5 */  float fIform;             // FILE TYPE SPECIFIER.
                                                     // +3 FOR A 3-D FILE
                                                     // +1 FOR A 2-D IMAGE
                                                     // -1 FOR A 2-D FOURIER TRANSFORM
                                                     // -3 FOR A 3-D FOURIER TRANSFORM
                                                     // -5 FOR A NEW 2-D FOURIER TRANSFORM
                                                     // -7 FOR A NEW 3-D FOURIER TRANSFORM
                                                     // +8 FOR A 2-D EIGHT BIT IMAGE FILE
                                                     // 11 FOR A 2-D EIGHT BIT COLOR IMAGE FILE
 
     Flag  and statistical information never used by Xmipp (set to 0 except for fAv that is set to -1) 

    /* 6 */  float fImami;           // MAXIMUM/MINIMUM FLAG. IS SET AT 0 WHEN THE
                                                     // FILE IS CREATED, AND AT 1 WHEN THE MAXIMUM AND
                                                     // MINIMUM HAVE BEEN COMPUTED, AND HAVE BEEN STORED
                                                     // INTO THIS LABEL RECORD (SEE FOLLOWING WORDS)
    /* 7 */  float fFmax;            // MAXIMUM VALUE
    /* 8 */  float fFmin;             // MINIMUM VALUE
    /* 9 */  float fAv;                // AVERAGE VALUE
    /* 10*/  float fSig;               // STANDARD DEVIATION. A VALUE OF -1. INDICATES
                                                    // THAT SIG HAS NOT BEEN COMPUTED PREVIOUSLY.

    Flag never used by Xmipp

    /* 11*/  float fIhist;             // FLAG INDICATING IF THE HISTOGRAM HAS BE
                                                    // COMPUTED. NOT USED IN 3D FILES!

    Dimesnsion information (Also used by Xmipp)

    /* 12*/  float fNcol;            // NUMBER OF PIXELS PER LINE (Columns X)

    Internal information for a complicated calculation of file lenght.

    /* 13*/  float fLabrec;       // NUMBER OF LABEL RECORDS IN FILE HEADER

     Euler Angles (Also used by Xmipp)

    /* 14*/  float fIangle;         // FLAG THAT TILT ANGLES HAVE BEEN FILLED
    /* 15*/  float fPhi;               // EULER: ROTATIONAL ANGLE
    /* 16*/  float fTheta;         // EULER: TILT ANGLE
    /* 17*/  float fPsi;               // EULER: PSI  = TILT ANGLE

     Aditional translational information (Never used by Xmipp).

    /* 18*/  float fXoff;            // X TRANSLATION
    /* 19*/  float fYoff;            // Y TRANSLATION
    /* 20*/  float fZoff;            // Z TRANSLATION

    Scale information (Never used by Xmipp).

    /* 21*/  float fScale;         // SCALE

     Internal information for a complicated calculation of file lenght.

    /* 22*/  float fLabbyt;      // TOTAL NUMBER OF BYTES IN LABEL
    /* 23*/  float fLenbyt;      // RECORD LENGTH IN BYTES

     Empty and unused space

     char  fNada[24];               // Empty Space (not used)

     Flag never used by Xmipp

    /* 30*/  float fFlag;            // THAT ANGLES ARE SET. 1 = ONE ADDITIONAL
                                                   // ROTATION IS PRESENT, 2 = ADDITIONAL ROTATION
                                                   // THAT PRECEEDS THE ROTATION THAT WAS STORED IN 15
 

    Used to contain additional rotation information. Now it's not used (set to 0)

    /* 31*/  float fPhi1;
    /* 32*/  float fTheta1;
    /* 33*/  float fPsi1;
    /* 34*/  float fPhi2;
    /* 35*/  float fTheta2;
    /* 36*/  float fPsi2;
 

    Previously used by Xmipp.

    double fGeo_matrix[3][3];    // Geometric info
     float fAngle1;                            // angle info

     float fr1;
     float fr2;                                      // lift up cosine mask parameters

     Used in Xmipp for Radon Transform

     float RTflag;                              // 1=RT, 2=FFT(RT)
     float Astart;
     float Aend;
     float Ainc;
     float Rsigma;
     float Tstart;
     float Tend;
     float Tinc;
 
 Empty and unused space

     char  fNada2[584];

   Date/Time and title fields (Also used by Xmipp)

 /*212-214*/ char szIDat[12];      // LOGICAL * 1 ARRAY DIMENSIONED 10, CONTAINING
                                                                // THE DATE OF CREATION (10 CHARS)
 /*215-216*/ char szITim[8];        // LOGICAL * 1 ARRAY DIMENSIONED 8, CONTAINING
                                                                // THE TIME OF CREATION (8 CHARS)
 /*217-256*/ char szITit[160];      // LOGICAL * 1 ARRAY DIMENSIONED 160
 
 

Back to the beginning
 


Byte Format

Each image pixel is stored as an unsigned char and no header is added to the file. The BYTE files are cols x rows bytes long, and as there's no header it is impossible to store any additional information about the image. In this package only the programs that handle micrographs employ these files as input and generally the user must supply the image dimensions at the command line.

As an example of how information is physically written to disk, imagine a 64x64 image (remember that in Xmipp images the top left corner define the physical origin, see also Matrix2D), the pixels are stored in the following order:

    (0,0) (0,1) (0,2) ... (0,63) (1,0) (1,1) ... ... ... ... (63,0) ... (63,63)

    where the former coordinates are given as (Y,X).

Back to the beginning
 


Selection Files

The selection files are files which contain information about which images are participating in a given process and which not. So you may have a set of images in your disk, make a list with the ones participating and give this list to any program which needs it. Even in the list you might select images which are participating and image which aren't but you want them to be all together in the same list.

Hence, each entry in the list is formed by the image name (it might include some path) and the state (ACTIVE=1, or DISCARDED=-1) for this process. Comments can also be included either using ';' or '#' in the first column. The spacing is not relevant at all, and notice that this file is not compatible with the Spider selection files, based on document files using the key as number of the filename. Notice that in Spider selection files you cannot give different image sets (defined by different root names) but in Xmipp you can, as you are giving the full imagename.

       # This is a comment
       ; This is also a comment
       prj0001.dem  1
       prj0002.dem  1
       prj0003.dem  1
       prj0004.dem  1
       prj0005.dem  1
       prj0006.dem -1
       prj0007.dem  1
       prj0008.dem  1
       Test/prj0009.dem  1
 
Back to the beginning


Document Files

The document files are textual files specially designed for numerical content, ie, to keep lists of numbers meaning something that the reading program must know (for example, angles of projections, a Spider selection file, or any other kind of numerical information). They consist of a list of lines, each one containing a record. Records are identified by its key (a numerical integer field starting at 1), so we have record #1, #2, #3, ... Records must be correlative numbered, this is one of the reasons for which a renumeration operation is performed so often in the Document File Module of the General Library. This numbering is so because many Spider routines read records until they find a "hole" in the keys, ie, a key that is not present. This is not the case in Xmipp, but even is much easier to work with correlative numbers. Even the key is not important inside the Xmipp programs.

In each record you can write in Xmipp as many numerical fields as you like, and the number of fields need not to be the same in all records. However, if you want to keep compatibility with Spider, then the number of fields is restricted to be less than 7, ie, a maximum of 6 (there is a way of extending the record using -999 as key but it has not been implemented in the Xmipp environment).

Between the record key and the data itself goes an integer number which says how many data fields compose this line. If you have 3 data fields, then this number is 3. In Xmipp you can specify as many fields as you like, with some limitation as you only have a single character to specify the number of fields. The trick used is to indicate the number of fields using the ASCII code of the different characters. This way '1' means a single field, '2' two fields, ..., '9' 9 fields, and ':', ';', '<' are respectively 10, 11, 12 fields, and so one. The idea is that the ASCII code ':' is 58 that is 48+10, the ASCII for ';' is 48+11, ...

You can add comments to the document file either with a semicolon (';') in the second column of the line (Spider fashion) or with a semicolon ';' or '#' as the first character of the line, regardless the column it is (Xmipp fashion). Xmipp recognizes both ways of defining comments. Even empty lines (with no characters at all) are also allowed in Xmipp.

Here goes an example of a document file: the first line is the column number, the second the meaning of the file at that position, later a comment (typical in Spider) telling you information about the extensions of data and code (dat in both cases), the name of the file and its creation time. The following are the different records, in this case there are only 5 records with no correlative keys (notice that the 6th record is commented out). In all lines the second field is a 2, telling that only the 2 first numerical fields are relevant in the line. This is a way to manage lines with a lot of fields in Xmipp mantaining the compatibility with Spider which only admits 6 numerical fields in the same line.
 

    123456789 123456789 123456789 123456789 123456789 1234565789 123456789 123456789
    KEY# REGS/  VALUE     VALUE       VALUE       VALUE       VALUE      VALUE
         LINE

     ; dat/dat   18-Jul-94  AT 14:29:48   jnk000.dat
        1 2  20.000      21.000      21.000      21.000      81.000      81.000
       12 2  20.000      56.000      21.000      25.000      21.000      81.000
      102 2  20.000      16.000      33.000      21.000      61.000      81.000
     8345 2  20.000      21.000      21.000      26.000      21.000      81.000
    28345 2  20.000      21.000      21.000      26.000      21.000      81.000
     ;  6 2  20.000      22.000      21.000      21.000      44.000      81.000as
 

The spacing between numbers is very important in Spider programs (based in Fortran I/O routines) but not in Xmipp (programmed in C++). To keep compatibility between both systems the exact structure to write a line is

    KKKKK N +FFFF.FFFFF +FFFF.FFFFF +FFFF.FFFFF +FFFF.FFFFF...

 where K is a Key digit, N is the Number of Fields, and F is a field digit. In Fortran the format string is I5 I16G12.5 and in C++ the following code has been generated

    ostream& operator << (ostream& o, DocLine &DL) {
       char    aux[30];
       string  str="";
       switch (DL.line_type) {
          case (1):
             // Print a data line
             sprintf(aux,"%5d ",DL.key);            str += aux;
             sprintf(aux,"%c ",48+DL.data.size());  str += aux;
             for (int i=0; i<DL.data.size(); i++) {
                sprintf(aux," % 10.5f",DL.data[i]); str += aux;
             }
             o << str << endl;
             break;
 
          case (2):
             // Print a comment
             o << DL.text << endl; break;
       }
       return o;
    }
 

Back to the beginning


xmipp_logo.gif (4792 bytes)