Directory UMM :Data Elmu:jurnal:P:PlantScience:Plant Science_BioMedNet:061-080:

(1)

DNA microarray technology is poised to make an important contribution to the field of plant biology. Stimulated by recent funding programs, expressed sequence tag sequencing and microarray production either has begun or is being

contemplated for most economically important plant species. Although the DNA microarray technology is still being refined, the basic methods are well established. The real challenges lie in data analysis and data management. To fully realize the value of this technology, centralized databases that are capable of storing microarray expression data and managing information from a variety of sources will be needed. These information resources are under development and will help usher in a new era in plant functional genomics.

Addresses

Carnegie Institution of Washington, Department of Plant Biology, Stanford, California 94305, USA

*e-mail: [email protected] †_{e-mail: [email protected]}

Current Opinion in Plant Biology2000, 3:108–116

Microarrays are widely recognized as a significant techno-logical advance providing genome-scale information about gene expression patterns [1,2•_{,3–6]. Two types of} microar-rays currently exist: the DNA-fragment-based and the oligonucleotide (oligo)-based microarrays. These differ primarily in the nature of the DNA fixed to the solid sup-port. In this review, we focus on the DNA-fragment-based arrays, in which PCR-amplified inserts of partially sequenced cDNA clones are spotted onto glass microscope slides [7••_].

A number of reviews describe the methods and outline the advantages of DNA-fragment microarrays [2,3,7••_,8,9••_]. As stated in many of them, the major advantages of microarrays derive from the fact that data can be collected for large numbers of genes in one experiment permitting true genome-scale sampling of gene expression patterns. It is important to note that microarrays, like northern blots, provide a measure of steady-state RNA levels. Although such information can be very valuable, not all biological processes will be regulated at this level of gene expression and other methods are needed to characterize processes regulated at other levels.

Three general types of experiments can be performed with microarrays. In ‘marker discovery’ experiments, the goal is

to discover a limited number of highly specific marker genes for a cell type, a developmental stage or an environ-mental treatment. In such experiments, the researcher is often interested only in genes that show a dramatic and selective induction or repression of expression. The sec-ond type of experiment, an exploratory or ‘biology discovery’ experiment, is one in which expression infor-mation about all genes is used to provide an integrative view of a plant’s response to a treatment. Such experi-ments have the potential to provide new insights into biological processes. In an example drawn from human biology, Iyer et al.[10•_{] demonstrated that serum-starved} cells respond to the addition of fresh serum in part by inducing a suite of genes involved in wounding. The wound response had not previously been implicated in this widely studied biological model. The third type of experi-ments is the ‘gene-function discovery’ experiexperi-ments. As microarray data accumulates in centralized public databas-es, hypotheses as to the function of novel, uncharacterized genes can be generated by in silico analyses of when and where such genes exhibit altered expression. These kinds of experiments will be very beneficial to the field of plant biology in terms of both developing a broader view of how plants function and rapidly identifying novel genes most worthy of detailed characterization. In this review, we highlight some of our observations on the use of expressed sequence tag (EST)-based microarrays, and, in particular, on some of the plant-specific concerns, on the basis of our experience with Arabidopsismicroarrays.

Construction of microarrays

The advantage of constructing microarrays from EST clones rather than from anonymous clones is that the subsequent analysis of gene expression patterns is much more informa-tive. EST clones are the obvious choice for Arabidopsis, because of the availability of a large collection from a public stock center — the ArabidopsisBiological Resource Center. There are a few limitations associated with this collection that should be noted. Most of the ArabidopsisESTs consist of only 300–400 basepairs sequenced from the 5′end of a longer cDNA clone [11]. Thus, the occurrence of chimeric clones consisting of sequences from unrelated genes will go undetected. Genes specifically induced under specialized environmental conditions or in specialized cells types will be under-represented in the EST collection, which was pre-pared from RNA from only four plant tissues. Recent estimates suggest that only a third of Arabidopsis genes [12,13] are represented among the current set of ~45,000

ArabidopsisESTs.

The complete sequence of the Arabidopsis genome is scheduled to be available in 2000 [14], and the first com-prehensive annotation to identify candidate genes will be completed shortly afterwards [15,16]. Thus, for

(2)

Arabidopsis, genes not represented by an EST clone can be recovered from genomic sequence information. It should be noted that the computer programs used to identify can-didate genes are imperfect, and it may take some time to fully identify and describe all Arabidopsisgenes properly. Therefore, Arabidopsismicroarrays constructed in the near future may include sequences that are not expressed and there may be missing genes that are unusual enough to be missed by gene-finding software programs.

Newer EST projects have noted the problems with the

ArabidopsisEST collection and will sequence a larger num-ber of clones from larger collections of diverse libraries. For example, The Institute for Genomic Research will sequence 150,000 tomato clones [17], and 300,000 soybean clones will be sequenced as part of the Soybean EST project [18] (Tables 1 and 2). Extensive EST sequencing projects have also been initiated for sorghum, rice, cotton, Medicago trun-catula, maize and wheat (Tables 1 and 2), and these will provide a very useful foundation for future microarray pro-jects in agronomically and horticulturally important plants.

For organisms lacking extensive collections of sequenced cDNA clones, anonymous clones from normalized or sub-tracted libraries can be used [8]. Targets that exhibit expression patterns of interest to the user can be sequenced at the end of the expression experiment. One disadvantage of constructing arrays from anonymous clones is that gene identity will only be determined for sequenced clones, whereas information about genes with static expression patterns will be lost.

One additional consideration that is unique to DNA-frag-ment-based microarrays is that handling mistakes during

the production phase can lead to DNAs being mislabeled or contaminated before spotting on the slides. In most com-mercial microarray facilities, the DNAs are re-sequenced as part of the quality assurance process. In the absence of such quality control, microarray users should verify key microar-ray results (see section on Data verification).

Cross-hybridization between related sequences is another limitation of using EST clones to construct microarrays. Cross-hybridization to a weakly expressed gene by a probe for a highly expressed, related gene will confound data col-lected for the weakly expressed gene. Current estimates are that sequences with >70% sequence identity over >200 nucleotides are likely to exhibit some degree of cross-hybridization under standard conditions [19•_{]. As estimates} of Arabidopsisgenes occurring in gene families range from >

12% [13] to 50% [12], this problem is of substantial concern. Thus, in future projects, there is some advantage to devel-oping a non-redundant set of completely sequenced cDNA clones. For Arabidopsis, the availability of genomic sequence will help address some of the limitations associat-ed with the EST clones. Primer pairs can be designassociat-ed to amplify gene-specific fragments for each member of a gene family for spotting on microarrays.

Sampling and experimental design

The composite nature of plant organs adds additional con-straints to the interpretation of microarray data. Gene expression in plant leaves, for example, is a composite of the expression patterns of various cell types (e.g. meso-phyll, epidermal, vascular) and of various spatially distinct parts (e.g. upper cell layers versus lower cell layers). At pre-sent, our ability to sample single cells or a small number of uniform cells is restricted. Also, it is difficult or impossible Table 1

EST or microarray projects funded by the National Science Foundation Plant Genome Program in 1999 (see [1] for projects funded in 1998).

Project Principal investigator/Institution Resources to be developed

Tools for potato structural and B Baker/University of California 55,000 Potato EST sequences

functional genomics at Berkeley Potato cDNA microarrays

Global expression studies for the J Ecker/University of Pennsylvania 8000 Fully sequenced cDNA clones

Arabidopsisgenome Arabidopsisoligoarrays

Expression information from 100 conditions High-density DNA microarrays for global K Ketchem/TIGR DNA-Fragment-Based arrays of all sequences

gene expression studies in plants ofArabidopsischromosome 2

Functional and comparative genomics of R Michelmore/ Sequences of resistance genes from disease resistance gene homologs University of California at Davis Arabidopsis, rice and maize

Specialized microarrays for Arabidopsis, rice and maize

Structure and function of the expressed C Qualaset/University of California 10,000 Wheat ESTs portion of the wheat genomes

Genomics of wood formation in Loblolly pine R Sederoff/North Carolina State University EST sequencing from Loblolly pine Pine microarrays

High-throughput mapping tools for P Schnable/Iowa State University Sequence 3′end of 20,000 maize ESTs maize genomics

(3)

to obtain highly synchronized populations of Arabidopsis

cells. These limitations should be considered when inter-preting microarray expression data.

As with any experiment, experimental design and sample replication should be considered when designing microar-ray experiments. One of the strengths of the microarmicroar-ray technique is that relatively minor changes in gene expres-sion can be measured. However, the resolving power of each experiment will be determined by the biological vari-ation in the plant samples and the technical varivari-ation associated with the microarray technology (see Probe preparation and hybridization).

In designing an experiment, researchers should consider whether they wish to use the microarrays for ‘marker discov-ery’ or whether they wish to gain genome-scale knowledge of a biological process in ‘biology discovery’ experiments. In the first case, a limited number of genes are identified as of interest from a microarray experiment and then used in sub-sequent experiments to characterize a specific treatment or developmental stage. Such an experiment might consist of several replicates of two samples harvested at one time point under a single condition. A limited number of microarrays are required and the microarray data serves as the starting point for a project rather than the endpoint. In the second case, in which a detailed profile of gene expression is desired, a large number of samples are generated during the course of analyzing a variety of time points, treatment levels, develop-mental stages or genetic lines. Such experiments are extremely rich in information, although costly to perform.

Probe preparation and hybridization

The design of experiments and collection of sample mate-rials is only the first step in microarray experimentation. There is a whole range of technical issues involved with microarray hybridization and the collection of hybridiza-tion signals [7••_{]. Only a few of these points are specific to} plant EST microarrays, and thus we will address them only briefly here.

Typically, two samples (e.g. from treated and control plants) are labeled with two distinct fluorochromes, mixed and hybridized simultaneously to the DNA-fragment-based microarrays. Thus, DNA-fragment microarrays are best suited to side-by-side comparisons of the two samples hybridized to the same slide (Type I experiments [7••_{] or} ‘marker discovery’ experiments). The data values are often expressed as a ratio of the signals collected for the two samples. Comparisons of multiple samples can be accom-plished by using one sample as a common reference (Type II experiments [7••_{] or ‘biology discovery’} experi-ments). Thus, for each given gene, the ratio of the fluorescent signals derived from the two samples provides a relative measure of gene expression (i.e. whether mRNA accumulation is higher, lower or unchanged in one sample relative to the second sample).

The original methods for preparing a probe required >1µg of poly(A)+ RNA [7••_{]; however, it can be difficult to} obtain this much mRNA from specialized tissues or cell types. In the future, methods of probe preparation that are based on total RNA or adapted for use with very small US Department of Agriculture funded EST or microarrays projects active in the year 2000.

Project Principal investigator/Institution Resources to be developed

A public soybean EST project R Shoemaker/ARS, Ames, Iowa 250,000—300,000 EST Soybean sequences

Identification of genes involved in J Mullet/Texas A&M University Specialized sorghum cDNA

plant response to water deficit microarrays

Genetic engineering of oilseed crops J Ohlrogge/Michigan State University Specialized ArabidopsiscDNA microarrays

The effects of external stimuli on plant responses E Davies/North Carolina State University Arabidopsismicroarrays at the molecular and cellular levels

Universal chip for genotyping single B Lemieux/University of Delaware Microarray for mapping

nucleotide polymorphisms SNPs in Brassica oleracea

Genomic analysis of seed quality traits in corn B Lemieux/University of Delaware Specialized maize cDNA microarrays Regulation of metabolism in developing C Benning/Michigan State University ArabidopsiscDNA microarrays

seeds of Arabidopsis

Molecular biology of plant development relating S van Nocker/Michigan State University Sour cherry cDNA microarrays to the needs of Michigan horticulture

The effect of environment on grain development W Hurkman/USDA, Albany, California Wheat cDNA microarrays productivity and quality in wheat

Nodulation in V. unguiculatawith Rhizobium C Cousin/University of the District of Columbia Nodule ESTs for Vigna unguiculata

or Bradyrhizobiumafter treatment with biosolids

Novel methods for sequence analysis of the W McCombie/Cold Spring Harbor Laboratory ESTs for maize maize genome

(4)

amounts of mRNA, and that use more efficient methods of label incorporation, will extend the utility of microarray analysis to difficult samples.

One of the chief technical specifications of microarrays is sensitivity. Sensitivity, determined by the minimal signal that can be reliably detected above background, is depen-dent in part on background levels and on the specific incorporation of label into the probes. Claims have been made that EST microarrays can detect 1 part in 100,000 [20•_{] to 500,000 [21]. Incyte Pharmaceuticals indicates that} with their GEM microarrays [22•_{], 2 pg out of 600 ng of} poly(A) RNA (1:300,000) can be detected [23•_{]. These} lev-els are generally considered sufficient to detect a mRNA present at a few copies per cell [20•_].

Another measure of sensitivity is the ability to reliably determine minor changes in mRNA levels between two samples. Incyte suggests the technical variation associated with their GEM arrays, as measured by the coefficient of variation, is 15% [23•_{]. Our own experiments, using} repli-cate 2500-element Arabidopsis arrays and experimental samples, show an average coefficient of variation that is closer to 37%. Incyte GEM arrays can be used to determine

a differential expression greater than 1.75-fold. In our labo-ratory, we have established standards for minimal expression levels of more than three standard deviations above the background of the slide and for differential expression of more than three-fold when analyzing three independent replicates.

Data collection

Microarray hybridization signals are collected by special-ized scanners, which are essentially computer-controlled inverted scanning fluorescent microscopes, with most mod-els providing confocal imaging. The output of these readers is typically a 16-bit TIFF image file, limiting the dynamic range of the data to less than five orders of magnitude.

The adoption of charge-coupled device (CCD)-based detectors [24] may increase the range of signal detection as improved methods and slide coatings [25] reduce back-ground noise. The TIFF files are processed with image analysis software (often supplied with the microarray read-er) to produce a measure of the hybridization signal for each DNA sample on the array. The ideal image analysis software package would be able to detect automatically each valid spot on the array, flag bad spots, subtract the Table 3

Microarray data analysis software.

Name Type Reference/URL

QuantArray Data visualization GSI Lumonics

http://www.genscan.com/lifescience_frame/software.htm Stingray Data visualization, clustering Molecular Applications Group

http://www.mag.com Array Vision Data extraction, visualization Imaging Research, Inc.

http://www.imagingresearch.com Spotfire Data visualization, clustering http://www.spotfire.com

GeneCluster Clustering using self-organizing maps http://www.genome.wi.mit.edu/MPR/GeneCluster/GeneCluster.html [34]

Clustering Tools Clustering Pangea Systems, Inc.

http://www.pangeasystems.com Cluster Hierarchical clustering http://rana.stanford.edu/clustering/ [30] GeneSight Data visualization; hierarchical and Biodiscovery

neural network clustering http://www.biodiscovery.com/genesight.html GeneSpring Data visualization; clustering; database links Silicon Genetics

http://www.sigenetics.com/GeneSpring/

GenEx Public web database; data storage; National Center for Genome Resources; Silicon Genetics data visualization http://www.ncgr.org/research/genex/; http://genex.sigenetics.com/ arraySCOUT Data visualization; clustering, database links Lion Biosciences

http://www.lionbioscience.com

ArrayDB LIMS National Human Genome Research Institute

http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/

LifeTools LIMS Incyte

http://www.incyte.com/products/lifetools/lifetools.html

GeneChip Information LIMS Affymetrix

Systems http://www.affymetrix.com/products/bioinformatics.html

Microarray Database LIMS Stanford University

(5)

local background, and output a mean signal-intensity above background for each spot [26,27].

Data analysis

The real heart of microarray experimentation and, ironical-ly, the most underdeveloped part, is data analysis. When an experiment is finished, the researcher is left with a series of signal intensities from thousands of clones. The true potential of microarrays as tools for gaining a global, high-ly integrated view of plant biology cannot be realized without large-scale relational databases to house data from multiple experiments and sophisticated data mining and pattern-recognition software tools for extracting multi-dimensional information from microarray data sets [26].

For simple comparisons, as is typical for marker-discovery experiments, there are a variety of data analysis methods. For small data sets, spreadsheets and statistical software

can be used to find differentially expressed genes. Our lab-oratory uses a collection of custom Perl scripts [28] to process and analyze microarray data (T Richmond, unpub-lished data). Companies that provide microarrays or microarray services often provide the necessary software and information to examine the experimental data (Table 3). For large-scale microarray projects, laboratory information management systems (LIMS) are essential. There are a number of microarray projects developing database tools for storing, processing and retrieving microarray data [27,29–33]. Although often freely avail-able, the software tools being developed by these groups are not amenable to inexperienced researchers, requiring computer skills for their installation and maintenance that are beyond those of the typical biologist.

Regardless of the software program used, the first step in data analysis involves the normalization of data. In order to

Small number of changes Medium number of changes Large number of changes

Channel 1 (log₁₀)

Clones 100

10

1

0.1 6

5

4

3

2

0 1

0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6

500 1000 1500 2000 2500 0

0 500 1000 1500 2000 2500

House 200 House 20 Mean Student

Channel 1/Channel 2 ratio

Channel 2 (log

10

)

Current Opinion in Plant Biology

Comparison of normalization methods. Upper panels, plot of normalized signal intensities (average of three replicates) from three microarray experiments, demonstrating the range of differential expression that can be observed. Signals were normalized using a Student normalization and readjusted to original values using an overall mean value. Dashed lines indicate two-fold and three-fold differential expression cutoffs. Lower panels, channel 1/channel 2 ratio for experiments depicted in upper panels. Data was normalized using

(6)

compare separate fluorescent channels, or the results from separate experiments, the raw data must be transformed into normalized values. Several different methods of data normalization can be used. One method identifies a stan-dard set of ‘housekeeping’ genes with expression levels that remain relatively constant under a variety of different conditions and uses their aggregate expression level to adjust the fluorescent signal of each target to a normalized value. The challenge is to identify a significant number (20–200) of medium to highly expressed genes with con-stant expression over a wide range of experimental conditions for each plant species. This requires a relative-ly large initial set of data in order to choose such genes.

Another method of normalization, which is not dependent on a predetermined set of housekeeping genes, makes the assumption that under the conditions being tested, most genes will not change in expression, and uses the mean, mode or median fluorescence for each channel as a means of normalization. Although this may work in most cases, there are instances where this assumption can break down. For specialized cDNA microarrays with a limited number of genes, many of the genes may change in expression and thus the addition of a large number of controls may be nec-essary. Figure 1 gives an example of the data collected from three different experiments, showing the range of dif-ferential expression that can be seen.

Another simple method of normalization is to convert the fluorescence of each target spot into a normalized value

µ= mean of the channel data, and σ= standard deviation of the channel data.) We carried out a series of experiments in triplicate to compare the methods of normalization described above. Figure 1 shows the comparison of meth-ods for three experiments, demonstrating that there is little difference among the normalization methods. The chief differences appear at the cutoff thresholds for differ-ential expression (typically three-fold); small changes in the threshold (0.1–0.5-fold) can result in large changes in the number of differentially expressed genes. We have found the Student normalization to be the simplest to use and typically use it for our experiments.

Side-by-side comparisons quickly inspire larger experi-ments that are based on time-courses, dose-response series or multiple treatments. For examining multiple data sets, cluster analysis is the most common method [30,32,34,35]. Genes can be grouped together according to their similari-ty in pattern of gene expression using a variesimilari-ty of different computer algorithms. From such analysis, the patterns of expression of thousands of genes across a wide number of conditions can be reduced to a smaller number of distinct patterns. Depending of the resolving power of the data set

and software programs, the number of clusters and their associated patterns may reflect the number of distinct responses associated with a treatment or developmental sequence — responses that may not always be reflected in cytological or physiological parameters that can be readily measured. As an example of such methods, cluster analysis of the expression data for the cell cycle in yeast revealed the expected stages of G0, G1, G2, S and M in this well-characterized experimental system [36].

Clustering is also a powerful tool for predicting the func-tion of previously uncharacterized genes. The utility of this approach has been clearly demonstrated in a number of systems [10•_{,36,37]. Iyer}_{et al}_{. [10}•_{] identified over} 200 previously unknown genes whose expression was reg-ulated during the response of fibroblasts to serum. The coordinate expression of uncharacterized genes with genes of known function provides clues to the function of the unknown genes (the ‘guilt-by-association’ rule for gene-function discovery [20•_{]). Plant biologists will be able to} conduct similar experiments, classifying genes on the basis of their coordinated expression in response to environ-mental stresses or developenviron-mental stages. For plants with fully sequenced genomes, another outcome of such cluster analysis is the discovery of candidate cis-acting regulatory elements from the identification of sequence elements shared in common among all genes of a cluster [36,38]. Such discoveries are likely to accelerate future discoveries of transcription factors and other DNA-binding proteins.

Microarray databases and information

management

The uniform goal of microarray data analysis software is to provide an easy way for the researcher to identify poten-tially interesting candidate genes and to provide convenient, detailed information about these genes. What readily becomes apparent when studying thousands of genes simultaneously is that one has personal knowledge of only a small fraction of genes. Information that is readi-ly accessible at the point of data anareadi-lysis is therefore an extremely valuable aid in interpreting results from microarray experiments. Towards this goal, microarray databases should ideally contain links to a multitude of external information sources [39•_{], including citation} data-bases [40], sequence datadata-bases [41], metabolic pathway databases [42,43] and protein domain databases [44]. Where possible, microarray databases should also contain links to information regarding expression data collected from other sources, map position, predicted protein motifs, functional classifications, related genes in both plants and other organisms, known alleles and mutations, interacting protein partners and availability of DNA stocks, seed stocks and so on. For most plant species, including

Arabidopsis, such information is now being collected in a systematic fashion.

(7)

groups working on establishing public repositories for microarray expression data [45,46]. The recently funded TAIR (The ArabidopsisInformation Resource) project [47], with its aim of providing a link from Arabidopsisto all other plants and vice versa (S Rhee, C Somerville, personal commu-nication), is the first step in this direction for plant biologists.

Data verification

Once candidate gene(s) of interest have been identified, researchers must decide how to proceed. As mentioned above, the gene identities of key genes should be verified by sequencing the spotted EST. In addition, many review-ers will not yet accept microarray data on its own, requiring secondary confirmation of differential expression.

There are a number of standard and newly emerging tech-niques that can be used to obtain secondary confirmation: RNA gel blots, quantitative or competitive PCR, and quantitative ribonuclease protection assays. Each has its own advantages, depending on the tissue and/or conditions being evaluated. RNA gel blots are simple and straightfor-ward if sufficient amounts of tissue are available. Quantitative PCR, especially using real-time instruments such as the LightCycler [48] or the TaqMan [49] systems, is the method of choice if source material is limiting. Difficulties with primer design, genomic DNA contamina-tion, and choice of internal controls, however, can make setting up these systems tedious, especially if the expres-sion pattern of a large number of genes must be evaluated.

Future prospects

For Arabidopsisand plants of significant commercial inter-est (e.g. maize, soybean, rice), commercial vendors will probably produce microarrays for sale. There are many advantages to commercializing the production of microar-rays. As long as the product is commercially viable, the companies can invest significantly more resources in pro-ducing large numbers of high-quality microarrays than can an academic laboratory. Ideally, all but the most specialized microarrays will be produced commercially in the future.

The future of Arabidopsismicroarrays is likely to diverge from those to be developed for other plant species. With the completion of the genome sequence, full genome microarrays can be constructed. In addition, the availabil-ity of high-qualavailabil-ity, error-free genomic sequence will permit the construction of oligo-based microarrays. The National Science Foundation has recently funded a group headed by J Ecker to produce an Arabidopsisgene expres-sion oligo-array in collaboration with Affymetrix (Sunnyvale, California; Table 1). In addition to gene-expression studies, oligo-based microarrays have been designed for use in scoring single nucleotide polymor-phisms molecular markers in population surveys and in genetic crosses [50]. M Mindrinos in R Davis’ group (Stanford University) is currently developing such a ‘map-ping chip’ for Arabidopsis.

experiments to monitor changes in gene-copy number [51]. The application of these methods can potentially be very helpful in the analysis of the various kinds of chromo-somal mutants that exist in plants. In addition, such an analysis could be extended to population genetic studies of gene evolution through duplication and deletions.

Further in the future, a method of generating ‘virtual masks’ using optical devices [52•_,53•_{] and inkjet-based} methods (Packard Instruments Company, Meriden, Connecticut) have been proposed as alternative methods to the established photolithography-based method for pro-ducing oligo-based microarrays or the spotting method for DNA-fragment microarrays. These technologies may lead to reductions in production costs and make the technology more accessible to all plant biologists. There are various other technical innovations being introduced that will enhance microarray experimentation, including multi-laser scanners that permit the use of more than two fluo-rochromes at a time (GSI Lumonics, Watertown, Massachusetts; Virtek, Woburn, Massachusetts), and new dyes with improved properties (Molecular Probes, Eugene, Oregon; New England Nuclear, Boston, Massachusetts).

Conclusions

DNA-fragment-based microarrays offer the promise of dra-matically enhancing our understanding of all aspects of plant biology. Microarrays are proving to be very valuable for the initial functional characterization of unknown genes as well as enabling detailed analyses of already well-known processes. This technology will greatly facilitate the analy-sis of mutants and other genetic resources that are available for plants. The full potential of this technology depends on the establishment of public databases to house microarray expression data so that a maximum number of researchers can bring their expertise and intuition to bear on the interpretation of the expression data.

Acknowledgements

Supported in part by the US Department of Agriculture, the US Department of Energy and the Carnegie Institution of Washington. We also thank Robin Shealy (University of Illinois at Urbana-Champagne) for his helpful advice about data normalization. We thank Gert de Boer, Greg Galloway, Wolf Scheible and John Vogel for their comments and suggestions on the manuscript. TAR thanks C Somerville for his support of his research. Publication # 1427 of the Carnegie Institution of Washington.

References and recommended reading

Papers of particular interest, published within the annual period of review, have been highlighted as:

• of special interest

••of outstanding interest

1. Baldwin D, Crane V, Rice D: A comparison of gel-based, nylon filter and microarray techniques to detect differential RNA expression in plants.Curr Opin Plant Biol1999, 2:96-103.

2. Brown PO, Botstein D: Exploring the new world of the genome

• with DNA microarrays.Nat Genet1999, 21:33-37. A thoughtful commentary on the impact of DNA microarrays.

(8)

4. Granjeaud S, Bertucci F, Jordan BR: Expression profiling: DNA arrays in many guises.Bioessays1999, 21:781-790.

5. Lemieux B, Aharoni A, Schena M: DNA chip technology.Mol Breed

1998, 4:277-289.

6. Lipshutz RJ, Fodor SPA, Gingeras TR, Lockhart DJ: High density synthetic oligonucleotide arrays.Nat Genet1999, 21:20-24. 7. Eisen MB, Brown PO: DNA arrays for analysis of gene expression.

•• Methods Enzymol 1999, 303:179-205.

This paper provides a thorough description of the DNA-fragment-based microarray technology.

8. Kehoe D, Villand P, Somerville S: DNA microarrays for studies of higher plants and other photosynthetic organisms. Trends Plant Sci1999, 4:38-41.

9. Schena M (ed): DNA Microarrays: A Practical Approach. New York:

•• Oxford University Press; 1999.

A collection of chapters covering all aspects of DNA microarray technology, including a number of useful protocols. Includes a broad overview of the field and detailed experimental advice.

10. Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM,

• Staudt LM, Hudson J Jnr, Boguski MSet al.: The transcriptional program in the response of human fibroblasts to serum.Science

1999, 283:83-87

Demonstrates the power of cluster analysis for identifying the function of unknown genes by clustering with genes of known identity and function.

11. Newman T, de Bruijn FJ, Green P, Keegstra K, Kende H, McIntosh L, Ohlrogge J, Raikhel N, Somerville S, Thomashaw Met al.: Genes galore: a summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA clones.

Plant Physiol1994, 106:1241-1255.

12. Lin, X, Kaul, S, Rounsley, S, Shea, TP, Benito, M-I, Town, CDet al.: Sequence and analysis of chromosome 2 of Arabidopsis thaliana.

Nature1999, 402:761-768.

13. Mayer K, Schüller C, Wambutt R, Murphy G, Volckaert G, Pohl T, Dusterhoft A, Stiekema W, Entian KD, Terryn N et al.: Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.

Nature1999, 402:769-777.

14. The ArabidopsisInformation Resource (TAIR): ArabidopsisGenome Initiative. URL http://www.arabidopsis.org/agi.html

15. The Institute for Genomic Research (TIGR):ArabidopsisGenome Annotation Database (AGAD). URL http://www.tigr.org/tdb/at/agad 16. Munich Information Center for Protein Sequences (MIPS):

Arabidopsis thalianadatabase (MATDB). URL http://websvr.mips.biochem.mpg.de/proj/thal/

17. The Institute for Genomic Research (TIGR):TIGR Tomato Gene Index.URL http://www.tigr.org/tdb/lgi/index.html

18. Soybean EST Project: The Soybean EST Project Home Page. URL http://macgrant.agron.iastate.edu/soybeanest.html

19. Richmond CS, Glasner JD, Mau R, Jin H, Blattner FR:Genome-wide

• expression profiling in Escherichia coliK-12.Nucleic Acids Res

1999, 27:3821-3835.

A detailed analysis comparing nylon-based arrays using 32_{P and glass slides}

using fluorescent dyes. The authors show that radioactive- and fluorescent-based methods produce nearly equivalent results.

20. Ruan Y, Gilmore J, Conner T: Towards Arabidopsisgenome

• analysis: monitoring expression profiles of 1400 genes using cDNA microarrays.Plant J1998 15:821-833.

This is the first article that provides an example of the use of large DNA microarrays in plant biology.

21. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW: Parallel human genome analysis: microarray-based expression

monitoring of 1 000 genes.Proc Natl Acad Sci USA

1996,93:10614-10619.

22. Incyte Pharmaceuticals: Gene Expression Microarrays.

• URL http://gem.incyte.com/gem/index.shtml

Incyte produces the first commercially available plant microarray, an

ArabidopsisGEM array containing 7 942 sequence verified elements rep-resenting 6 068 genes/clusters.

23. Incyte Pharmaceuticals: Incyte GEM Array Reproducibility Study.

• URL http://gem.incyte.com/gem/GEM-reproducibility.pdf

This study, albeit unrefereed, is one of the few that examines the sensitivity and reproducibility of DNA microarrays.

24. Che D, Bao P, Li R, Daywalt M, Ruffalo T, Shi J, Prokhorova A, Lublinsky A, Lermer N, Müller UR: A novel surface, attachment chemistry and CCD-based multicolor imaging system for analysis of genomic DNA arrays.Proc Scanning 1999, 21:63-64.

25. Beier M, Hoheisel JD: Versatile derivatisation of solid support media for covalent bonding on DNA-microchips.Nucleic Acids Res

1999, 27:1970-1977.

26. Bassett DE, Eisen MB, Boguski MS: Gene expression informatics: it’s all in your mine.Nat Genet 1999, 21:51-55.

27. Ermolaeva O, Rastogi M, Pruitt KD, Schuler GD, Bittner ML, Chen Y, Simon R, Meltzer P, Trent JM, Boguski MS: Data management for gene expression arrays.Nat Genet1998, 20:19-23.

28. Perl: Perl Mongers: The Perl Advocacy People. URL http://www.perl.org

29. Bittner M, Meltzer P, Trent J: Data analysis and integration: of steps and arrows.Nat Genet 1999, 22:213-215.

30. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns.Proc Natl Acad Sci USA1998, 95:14863-14868.

31. Imbert MC, Nguyen VK, Granjeaud S, Nguyen C, Jordan BR: LABNOTE, a laboratory notebook system designed for academic genomics groups. Nucleic Acids Res1999, 27:601-607.

32. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation.Proc Natl Acad Sci USA

199996:2907-2912.

33. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture.Nat Genet 1999, 22:281-285.

34. Toronen P, Kolehmainen M, Wong G, Castren E: Analysis of gene expression data using self-organizing maps.FEBS Lett

1999,451:142-146.

35. Wen X, Fuhrman S, Michaels GS, Carr DB, Smith Set al.: Large-scale temporal gene expression mapping of central nervous system development.Proc Natl Acad Sci USA1998, 95:334-339. 36. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB,

Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.Mol Biol Cell 1998, 9:3273-3297. 37. DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic

control of gene expression on a genomic scale.Science

1997,278:680-686.

38. Zhang MQ: Promoter analysis of co-regulated genes in the yeast genome. Comput Chem1999, 23:233-250.

39. Burks C: Molecular biology database list.Nucleic Acids Res

• 1999,27:1-9.

A comprehensive list of brief descriptions and pointers to internet sites for various molecular biology databases.

40. National Library of Medicine: PubMed. URL http://www.ncbi.nlm.nih.gov/ PubMed/

41. National Center for Biotechnology Information: GenBank. URL http://www.ncbi.nlm.nih.gov/

42. Japanese Human Genome Project: Kyoto Encyclopedia of Genes and Genomes (KEGG).URL http://www.genome.ad.jp/kegg/kegg2.html

43. Kanehisa M: A database for post-genome analysis.Trends Genet

1997, 13:375-376.

44. Pfam Consortium: The Pfam database of protein domains and HMMs.URL http://pfam.wustl.edu/

45. European Bioinformatics Institute: The ArrayExpress Database. URL http://www.ebi.ac.uk/arrayexpress/

46. National Center for Genome Resources: GeneX: a Collaborative Internet Database and Toolset for Gene Expression Data. URL http://www.ncgr.org/research/genex/

(9)

49. PE Biosystems: Applied Biosystems.URL http://www.pebio.com/ab/ 50. Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G,

Perkins N, Winchester E, Spencer J et al.: Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science1998,280:1077-1082.

51. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, Jeffrey SS, Botsein D, Brown PO: Genome-wide analysis of DNA copy-number changes using cDNA microarrays.

Nat Genet 1999, 23:41-46.

microarrays using a digital micromirror array.Nat Biotechnol 1999, 17:974-978.

Both this paper and [53•_{] outline an innovative use of new technology that}

may revolutionize oligonucleotide array construction.

53. Gardner S: Digital Optical Chemistry System.URL

• http://pompous.swmed.edu/expbio/doc/

This website, like [52•_{], describes a system that uses a unique UV light}