Rank-Based Nonlinear Normalization of Oligonucleotide Arrays.
- Author:
Peter J PARK
1
;
Isaac S KOHANE
;
Ju Han KIM
Author Information
1. Children's Hospital Informatics Program, Children's Hospital, Harvard Medical School, Boston, MA 02115, USA. peter-park@harvard.edu
- Publication Type:Original Article
- Keywords:
gene expression;
microarray normal: Edtion;
rank statistic
- MeSH:
DNA, Complementary;
Gene Expression;
Generalization (Psychology);
Motivation;
Oligonucleotide Array Sequence Analysis*
- From:Genomics & Informatics
2003;1(2):94-100
- CountryRepublic of Korea
- Language:English
-
Abstract:
MOTIVATION: Many have observed a nonlinear relationship between the signal intensity and the transcript abundance in microarray data. The first step in analyzing the data is to normalize it properly, and this should include a correction for the nonlinearity. The commonly used linear normalization schemes do not address this problem. RESULTS: Nonlinearity is present in both cDNA and oligonucleotide arrays, but we concentrate on the latter in this paper. Across a set of chips, we identify those genes whose within-chip ranks are relatively constant compared to other genes of similar intensity. For each gene, we compute the sum of the squares of the differences in its within-chip ranks between every pair of chips as our statistic and we select a small fraction of the genes with the minimal changes in ranks at each intensity level. These genes are most likely to be non-differentially expressed and are subsequently used in the normalization procedure. This method is a generalization of the rank-invariant normalization (Li and Wong, 2001), using all available chips rather than two at a time to gather more information, while using the chip that is least likely to be affected by nonlinear effects as the reference chip. The assumption in our method is that there are at least a small number of nondifferentially expressed genes across the intensity range. The normalized expression values can be substantially different from the unnormalized values and may result in altered down-stream analysis.