1.Rank-Based Nonlinear Normalization of Oligonucleotide Arrays.
Peter J PARK ; Isaac S KOHANE ; Ju Han KIM
Genomics & Informatics 2003;1(2):94-100
MOTIVATION: Many have observed a nonlinear relationship between the signal intensity and the transcript abundance in microarray data. The first step in analyzing the data is to normalize it properly, and this should include a correction for the nonlinearity. The commonly used linear normalization schemes do not address this problem. RESULTS: Nonlinearity is present in both cDNA and oligonucleotide arrays, but we concentrate on the latter in this paper. Across a set of chips, we identify those genes whose within-chip ranks are relatively constant compared to other genes of similar intensity. For each gene, we compute the sum of the squares of the differences in its within-chip ranks between every pair of chips as our statistic and we select a small fraction of the genes with the minimal changes in ranks at each intensity level. These genes are most likely to be non-differentially expressed and are subsequently used in the normalization procedure. This method is a generalization of the rank-invariant normalization (Li and Wong, 2001), using all available chips rather than two at a time to gather more information, while using the chip that is least likely to be affected by nonlinear effects as the reference chip. The assumption in our method is that there are at least a small number of nondifferentially expressed genes across the intensity range. The normalized expression values can be substantially different from the unnormalized values and may result in altered down-stream analysis.
DNA, Complementary
;
Gene Expression
;
Generalization (Psychology)
;
Motivation
;
Oligonucleotide Array Sequence Analysis*
2.Poor Correlation Between the New Statistical and the Old Empirical Algorithms for DNA Microarray Analysis.
Ju Han KIM ; Winston P KUO ; Sek Won KONG ; Lucila Ohno MACHADO ; Isaac S KOHANE
Genomics & Informatics 2003;1(2):87-93
DNA microarray is currently the most prominent tool for investigating large-scale gene expression data. Different algorithms for measuring gene expression levels from scanned images of microarray experiments may significantly impact the following steps of functional genomic analyses. Affymetrix(R) recently introduced high-density microarrays and new statistical algorithms in Microarray Suit (MAS) version 5.0(R). Very high correlations (0.92 - 0.97) between the new algorithms and the old algorithms (MAS 4.0) across several species and conditions were reported. We found that the column-wise array correlations had a tendency to be much higher than the row-wise gene correlations, which may be much more meaningful in the following higher-order data analyses including clustering and pattern analyses. In this paper, not only the detailed comparison of the two sets of algorithms is illustrated, but the impact of the introducing new algorithms on the further clustering analysis of microarray data and of possible pitfalls in mixing the old and the new algorithms were also described.
DNA*
;
Gene Expression
;
Oligonucleotide Array Sequence Analysis*
;
Statistics as Topic