Scripps Research Logo

Microarray Core Facility

Data Analysis - Fisher's Combined P Method

Fisher’s Combined P Method for detecting expressed genes with Affymetrix expression arrays

Implementation:

The Fisher’s Combined P Method has been implemented (Schaffer, personal communication) in the R program software (R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License).  This method determines whether the average intensity of probe-sets in a sample group is greater than the average intensity of background probes of similar GC content to the sample probes.    The chi-square threshold for determining how to make the expressed/unexpressed calls has been adjusted so that the calls are compatible with the present and absent calls as implemented in the Affymetrix Microarray Suite version 5.  The Affymetrix algorithm performs a Wilcoxon signed rank-based gene expression to calculate the presence and absence calls using mismatched probesets.  The chi-squared threshold adjustment was performed such that 97% of the probesets called all absent in a sample group are called unexpressed, and 95% of the probesets called all present in a sample group are called expressed with the Fisher’s Combined P Method.

Background:

Fisher (1932) proposed a method for combining p-values from independent tests of significance.  This combined p method has been used by others among them Hess and Iyer (2007) to detect differentially expressed genes using Affymetrix expression array data.  Hess and Iyer used the Fisher’s combined p method to combine p-values from probe level tests of significance.  They demonstrated that this method successfully selected differentially expressed genes identified by other current methods using three spike-in datasets.  The same algorithm has been used to test whether genes are detected above background (DABG) by Affymetrix software (Affymetrix Power Tools).   

Fisher’s combined P Method

Fisher's method combines extreme value probabilities, p-values (results at least as extreme, assuming H0 true that both groups are the same) from each test into one test statistic (χ2) having a chi-square distribution using the formula

Fisher's method  x-squared2 formula

where a particular probe set i, m is the number of probes in the probe set, and pi is the p-value for the test for the particular probe.

The p-value for (χ2) itself can then be interpolated from a chi-square table using 2m "degrees of freedom", where m is the number of tests being combined. As in any similar test, H0 is rejected for small p-values at the α level of significance if

Fisher's method significance formula

Code details

  • Read in the CEL files as Affybatch.
  • Quantile normalize the probe level intensities.
  • List all the sample names so that they will be factored as separate covariates.
  • Process all the GC probe bins and create matrix of the average log2 normalized signals, probe indices, and GC content, standard deviations, number of probes.
  • Read in the previously calculated GC content of all the probes and their indices.
  • Bind together the probe information by probe set, including the log2 normalized intensities and GC content for both the array probes and their GC bin probes.
  • Calculate the Fisher’s combined p-value for each probe set.
  • Determine the Expression call using a calibrated chi-squared threshold value of 80.

Calibration

The chi-square threshold adjustment was performed to the quantile normalized probes such that 97% of the probe sets called absent on a single array are called unexpressed, and 95% of the probe sets called present are called expressed.

Fisher's method graph

References

Hess, A. and Iyer, H, Fisher’s combined p-value for detecting differentially expressed genes using Affymetrix expression arrays. (2007) BMC Genomics, 8:96.

Fisher, R. A. "Combining independent tests of significance" (1948) American Statistician, vol. 2, issue 5, page 30.