Affymetrix GeneChip Data Analysis
To gain a better understanding of the Affymetrix Data Analysis process, please see our Experiment Design and Data Analysis Outline. The outline provides an easy-to-follow flow chart for analyzing your data based on the number of replicates used in your experiment. Understanding this chart is a good way to begin the data analysis process.
Associated Files
When you come to the lab to pick up your data, you will receive a CD containing the following files for each chip :
-
.CEL (a processed image file with calculated signal intensities assigned to each oligonucleotide spot on the GeneChip)
-
.CHP (only opened within GCOS)
-
.TXT (CHP file in text format containing the relevant signal values, absolute calls and p-values assigned to each gene on the GeneChip)
-
.RPT (a text-based file containing the quality control numbers for the experiment including percent present, presence of spiked controls, background, noise, etc.)
-
.EXP (a text-based file containing general experiment information such as sample name, date, etc.)
Data Analysis
The Core facility performs analysis on Affymetrix 3’ expression arrays, Exon arrays and Tiling Arrays. A variety of methods are freely available to analyze microarray datasets. The methods to be utilized depend on the design of the experiment. You can contact the Core Facility to discuss analysis procedures for your experiment as well as the experimental design.
- GeneChip Quality Assessment - Standard Affymetrix standard measures are used for quality assessment, namely, noise, background, scale factors, present/absent calls, and GAPDH 3’/5’ ratios. For the Affymetrix custom arrays such as the GlycoV4 chip, these standard measures cannot be calculated due to the exclusion of the mismatched probesets. For these custom arrays, the present/absent calls are replaced by expression calls. Expression calls are calculated using the Fisher’s Combined P Method, implemented by Lana Schaffer* in the R program software (R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License). The chi-square threshold for determining the calls has been adjusted so that they are compatible with the present and absent calls of the Affymetrix Microarray Suite version 5. Use this link, Fisher’s combined P Method, for further information. In addition, a full quality assessment of each chip is done using Bioconductor packages using quality metrics recommended by Affymetrix, intensity distributions, array comparisons, and RNA degradation plots. Arrays can be identified which are outliers to the rest of the arrays.
- Normalization - The most frequently performed normalization procedure is Robust Multichip Average (RMA). RMA is freely available as a package that is accessible through a stand-alone desktop application that runs on Windows machines, called RMAExpress. RMA uses the experimental Affymetrix CEL files and the chip CDF file to perform a group background correction and quantile normalization procedure. Signal intensities are generated for each probe set on each chip based on the model, and the summary values are output to a single text file. RMA is also widely available through most software packages for microarray analysis. Other normalization procedures are in GCRMA, dChip, and MAS5.
- Differential Expression
- Anova - Whenever a microarray experiment is performed with two or more groups, Anova can be used to identify genes differentially expressed amongst the groups. A straightforward procedure for group comparisons is to run a t-test or an F-test.
- BRB Array Tools, an add-on to Excel, provides a user friendly and free access to Anova functions, predictive classifications, clustering, functional classification, and data visualization tools.
- All data processing can also be performed within the Bioconductor project of the R program software . A linear modeling approach and the empirical Bayes statistics as implemented in the limma package in the R software are employed for differential expression analysis. The limma package is useful for analysis of factorial designed experiments.
- Rank Products is a simple and powerful technique for finding differentially expressed genes when the number of samples in the experiment is small.
- Weighted Co-expression Analysis is used to explore molecular interaction networks across RNA tissue samples in microarray datasets. The co-expression networks can be organized into modules of system level functionality for coordinated gene expression.
- Category analysis and GSEA provide pathway enrichment tools to help interpret datasets.
- Ingenuity Pathway Analysis software provides a link to literature cured database to find function and pathways for microarray analysis.