Scripps Research Logo

DNA Array Core Facility

Data Analysis

Affymetrix GeneChip Data Analysis

To gain a better understanding of the Affymetrix Data Analysis process, please see our Experiment Design and Data Analysis Outline. The outline provides an easy-to-follow flow chart for analyzing your data based on the number of replicates used in your experiment. Understanding this chart is a good way to begin the data analysis process.

Associated Files

When you come to the lab to pick up your data, you will receive a CD containing the following files for each chip :

  • .CEL (a processed image file with calculated signal intensities assigned to each oligonucleotide spot on the GeneChip)
  • .CHP (only opened within GCOS)
  • .TXT (CHP file in text format containing the relevant signal values, absolute calls and p-values assigned to each gene on the GeneChip)
  • .RPT (a text-based file containing the quality control numbers for the experiment including percent present, presence of spiked controls, background, noise, etc.)
  • .EXP (a text-based file containing general experiment information such as sample name, date, etc.)

Data Analysis

The Core facility performs analysis on Affymetrix 3’ expression arrays, Exon arrays and Tiling Arrays. A variety of methods are freely available to analyze microarray datasets. The methods to be utilized depend on the design of the experiment. You can contact the Core Facility to discuss analysis procedures for your experiment as well as the experimental design.

  • GeneChip Quality Assessment - A full quality assessment of each chip is done using Bioconductor packages using quality metrics recommended by Affymetrix, intensity distributions, array comparisons, and RNA degradation plots. Arrays can be identified which are outliers to the rest of the arrays.
  • Normalization - The most frequently performed normalization procedure is Robust Multichip Average (RMA). RMA is freely available as a package that is accessible through a stand-alone desktop application that runs on Windows machines, called RMAExpress. RMA uses the experimental Affymetrix CEL files and the chip CDF file to perform a group background correction and quantile normalization procedure. Signal intensities are generated for each probe set on each chip based on the model, and the summary values are output to a single text file. RMA is also widely available through most software packages for microarray analysis. Other normalization procedures are in GCRMA, dChip, and MAS5.
  • Differential Expression
    • Anova - Whenever a microarray experiment is performed with two or more groups, Anova can be used to identify genes differentially expressed amongst the groups. A straightforward procedure for group comparisons is to run a t-test or an F-test.
    • BRB Array Tools, an add-on to Excel, provides a user friendly and free access to Anova functions, predictive classifications, clustering, functional classification, and data visualization tools.
    • All data processing can also be performed within the Bioconductor project of the R program software . A linear modeling approach and the empirical Bayes statistics as implemented in the limma package in the R software are employed for differential expression analysis. The limma package is useful for analysis of factorial designed experiments.
    • Rank Products is a simple and powerful technique for finding differentially expressed genes when the number of samples in the experiment is small.
  • Weighted Co-expression Analysis is used to explore molecular interaction networks across RNA tissue samples in microarray datasets. The co-expression networks can be organized into modules of system level functionality for coordinated gene expression.
  • Category analysis and GSEA provide pathway enrichment tools to help interpret datasets.
  • Ingenuity Pathway Analysis software provides a link to literature cured database to find function and pathways for microarray analysis.