of nucleotides either adenine or thymine (Dowton and Austin, 1995, 1997).
Mitochondrial ribosomal genes have the same base compositional bias, but for either genome these are more difficult to align. Ribosomal genes have a second- ary coding structure that can make them relatively easy to align within the con- served stem or loop regions, but difficult or nearly impossible to align without bias in the variable loop regions, which may have both numerous substitutions and multiple insertion and deletion events. These regions can be hypervariable, with unique long insertions that make them difficult to align, especially between diver- gent taxa, and these regions are either subjectively aligned by eye or the ambigu- ous alignment regions are excluded from the analysis (Cameron et al., 1992;
Unruh and Woolley, 1999).
highly variable and tolerate numerous insertions or deletions of nucleotides.
Insertions or deletions of nucleotides can make sequences extremely difficult to align, to the point where they are often excluded from phylogenetic analyses.
Only four bases are possible for DNA sequences, therefore multiple substitu- tions at individual sites can be a problem. Assuming that random substitutions at novel sites are common after the initial divergence of taxa, then all changes can be considered as homologous. However, enough substitutions must be accumu- lated and fixed within the taxon of interest to be detected in the regions of inter- est, and to provide phylogenetic signal. If not enough changes are fixed between lineages, then random non-homologous changes may easily confuse any phylo- genetic signal. In this ‘zone of optimal divergence’, the number of expected and observed base changes is approximately equal (Fig. 3.2). With increasing diver- gence time and increasing numbers of substitutions, nucleotides may change more than once, resulting in a decrease in the number of observed versus expect- ed changes (Fig. 3.2, points fall below line of equality). The zone of optimal divergence on Fig. 3.2 refers to a balance of changes in which there is sufficient divergence to establish relationships among closely related taxa, and yet not too much of a saturation of changes that cause relationships of more divergent taxa to be obscured. As the observed and expected changes begin to deviate more dramatically, then the number of unobserved changes increases, finally to the point where the sequence cannot be used to analyse a particular set of relation- ships.
Owing to functional constraints, not all gene regions accept mutations at the same rate (Fig. 3.2). Within the same taxonomic comparisons, ribosomal genes may change very quickly (28S-D2), at an intermediate rate (28S-D3), or very slowly (18S). Because of codon bias, protein-coding genes diverge at different rates for the three codon positions, with the third base the most free to change and the second position most conserved (cf.EncarsiaCOI, Fig. 3.2). In this latter example, the divergence in the first base positions of COI between species of Encarsia is intermediate to that of 28S-D2 and D3. Although divergence rates appear to be stable within taxonomic groups (cf. D2 and D3 for the euchari- tid/perilampid and Encarsia comparisons, Fig. 3.2), they may differ between groups. The divergence of 28S-D2 of Eulophidae is comparable in divergence to the more conserved 28S-D3 region of Eucharitidae + Perilampidae, and diver- gence between species of Encarsiafor 28S-D2 and D3 is comparable to, or faster than, the family-level divergence in other Chalcidoidea (Table 3.1, Fig. 3.2).
The diversification of Chalcidoidea is coincident with the late Cretaceous explosion of angiosperms, with representation by Mymaridae, Tricho- grammatidae and Tetracampidae in Canadian amber (Yoshimoto, 1975) and what appears to be a species of Torymidae from late Cretaceous (Turonian) compression fossils in Botswana (D. Brothers, South Africa, 2002, personal com- munication). Given a similar potential age of origin, amazing differences are found in the amount of sequence divergence between taxonomic groups in Chalcidoidea. Divergence can range from 14.8% for Signiphoridae to 36.2% for Pteromalidae (Table 3.1). It should be noted, however, that comparisons of diver-
Molecular Systematics 45
gence in Aphelinidae (28.8%) and Pteromalidae are a bit unfair, since these are probably not monophyletic families, thus indicating a problem of taxonomy and not divergence. However, Chalcididae and Encyrtidae are demonstrably mono- phyletic groups that have divergence rates of 25.9–28.6%. Trichogrammatidae have a paltry 17.6% divergence between 33 genera. There is also little corre- spondence of rates at the generic level. Divergence rates in Encarsia(29.0%) and Aphytis(23.1%) are equivalent to family-level divergence in other groups, and vastly exceed the low rate of divergence in Eucharitini (9.6%), which, although
46 J. Heraty
Fig. 3.2.DNA saturation curves for divergence of nucleotides between species for various gene regions and different taxonomic groups within Chalcidoidea.
Pairwise uncorrected and general time reversible models of character state change were estimated in PAUP4.0*b9. Changes within COI are separated into the three codon positions. In the region of optimal divergence, observed and estimated changes are roughly equal and phylogenetic relationships are estimated relatively easily using various phylogenetic methods. As regions become saturated, proce- dures that estimate multiple substitutions per site may better resolve relationships between taxa.
based on only 25 genera, is a representative sample of the entire tribe, with 47 morphologically distinct genera! Clearly, there is no single gene region that can be used for universal comparison. Each group and gene will need to be evaluat- ed for its utility to answer particular questions.
Rates of divergence have a direct effect on the methods employed to analyse the data (Brower and DeSalle, 1994; Swofford et al., 1996; Unruh and Woolley, 1999). Within the zone of optimal divergence, parsimony and likelihood approaches, with the latter compensating for estimated rates of change, should provide similar, if not the same, results. This has been the case for almost all studies within Chalcidoidea. The results from parsimony, distance (neighbour- joining) and likelihood methods of analysis are virtually the same (Rasplus et al., 1998; Kerdelhue et al., 1999; Babcock et al., 2001; Lopez-Vaamonde et al., 2001).
In only one case was maximum likelihood the only method employed (Machado et al., 2001), for an analysis of 20 genera of fig wasps using COI, but these results
Molecular Systematics 47
Table 3.1.Maximum percentage sequence divergence for the 28S-D2 region within various taxonomic groups of Chalcidoidea. Numbers in parentheses after subfamily or family names are the number of genera compared; those after generic names are the number of species compared.
Aphelinidae (11)a 28.8*
Coccophaginae (4)b 29.0 Eulophidae (51)e 17.4
Encarsia (30)b 29.0
Aphelininae (6)a 25.8 Eurytomidae (3)d 16.1
Aphytis (30)a 23.1
Aphelinus (7)a 3.0 Mymaridae (2)d 18.2
Azotinae (1)c 8.0
Perilampidae (7)b 14.2
Chalcididae (7)d 25.9 Chrysolampinae (3)b 5.7
Perilampinae (4)b 14.0 Encyrtidae (12)c 28.6
Pteromalidae (11)d 36.2
Eucharitidae (31)b 21.5 Cleonyminae (3)d 7.9
Oraseminae (3)b 5.8 Pteromalinae (3)b 3.3
Orasema (21)b 5.0
Psilocharitini (2)b 12.0 Signiphoridae (3)c 14.8 Eucharitini (25)b 9.6
Trichogrammatidae (33)g 17.6
Eupelmidae (13)a 18.4 Paracentrobia (6)g 4.9
Trichogramma (34)g 8.1 Eulophidae (51)e 17.4
Chalcidoidea (73)d 38.5
a = J.-W. Kim, UCR, unpublished; b = J. Heraty and D. Hawks, UCR, unpublished; c = J.
Munro, UCR, unpublished; d = Babcock et al., 2001; e = Gauthier et al., 2000; g = J.
George and A. Owen, UCR, unpublished.
* Not all species of Encarsiacited below are included in this estimate.
were not compared with a parsimony analysis of the same data. When saturation is a problem, analysing the data in combination with more conserved sequences to stabilize the deeper nodes is an option. For example, analyses of relationships in eucharitids and perilampids, which are reaching high levels of saturation (> 25%) for interfamilial divergences (Fig. 3.2), are relatively stable when analysed in combination with 28S-D3 and 18S-E23 (Fig. 3.4). As yet, none of the genes provide enough information on their own to indicate a single comprehensive set of relationships.
The stability of phylogenetic trees, and hence the confidence of predictions using those hypothesized relationships, is based on: (i) resolution of the resulting trees; (ii) statistical support; and (iii) comparison with previous taxonomic hypotheses. Resolution is determined by the number of trees generated and sim- ilarity of branch points across all of these trees, as determined by a consensus of all the resulting trees (Swofford, 1991). In some cases, there may be very differ- ent sets of solutions or ‘islands’ of tree topologies, with each island possibly rep- resenting a different evolutionary scenario for a trait of interest (Maddison, 1991). The goal of any study in phylogenetics is a single island of one or more trees with the highest possible resolution. Statistical support is measured in a variety of ways, but the most common are bootstrap analyses, decay indices and successive approximations character weighting (cf. Carpenter, 1988; Swofford et al., 1996). Bootstrap analyses are based on iterative resampling and reanalysis of the data, with the percentage value indicating the proportion of iterations in which a particular node is supported. Decay or Bremer indices refer to the number of extra steps required for a branch not to be present in the resulting consensus of trees. Both are directly dependent on the number of synapomorphic changes of a character state (nucleotide change) on a given branch. Successive approximations involves reweighting the character data based on the fit of each character to a particular set of most-parsimonious tree topologies, followed by successive reanalyses of the trees and reweighted data until the final tree or trees stabilize to the same weighted length. Tree branches supported in both weighted and unweighted analyses are considered to be well supported. If one of the most-parsimonious trees is retrieved after reweighting, then this is considered to be the most favoured hypothesis of relationships. The last measure of stability, comparison with previous taxonomic hypotheses, is often not considered when evaluating trees; although it is probably the most important of considerations. In its simplest form, do species of the same genus, or higher- level taxon, cluster together in the final set of relationships? If not, why not?
Admittedly, the taxonomy could be wrong, but then it should be backed with overwhelming support to the contrary, or a least a re-evaluation of the taxa in question.
In the following discussion, I will be focusing on various techniques of analy- sis, and the evaluation and interpretation of sequence data that have been applied within Chalcidoidea. There has been a significant change in the types of analyses being conducted since the recent review by Unruh and Woolley (1999).
These following discussions are meant to build on their foundation, which
48 J. Heraty
focused largely on techniques other than sequencing. New examples have come to light within the parasitic wasps in the past few years.
Identification
To specialists in biological control, the recognition of units of interest (demes, races, populations, species) is initially the item of most interest. Systematics/tax- onomy has always been regarded as important for providing names of both native and introduced species. An initial assessment, based solely upon morpho- logical distinctness, is often later corroborated by information on degree of repro- ductive isolation or some other behavioural data, often in collaboration with biological control specialists working with these populations. The differentiation of these units has benefited from various molecular methods of analysis, ranging from allozyme profiles to the more recent use of molecular markers such as random amplified fragment polymorphism (RAPD), restriction fragment length polymorphism (RFLP) and amplified fragment length polymorphism (AFLP) analyses (Vos et al., 1995; Unruh and Woolley, 1999). However, these methods are useful only for the differentiation of known populations or species, because of numerous analytical problems. They have little utility for achieving the second objective of systematics: the understanding of relationships and phylogenetic history between different groups. Decreasing costs and ease of use have enabled molecular sequencing to be more readily applied to the diagnosis of species and to understanding their relationships.
Aphelinidae and Trichogrammatidae are minute, often less than 2 mm in size, with some species of less than 0.5 mm overall body length. Morphological features in these tiny wasps (or flying bacteria as coined by Bruce Campbell) are often very reduced, and among closely related species the features used to recog- nize reproductively isolated species can sometimes be very minor and difficult to measure, or even absent (Pinto et al., 2002a,b). With the need for accurate iden- tification and a lack of available expertise, for both precision mounting of speci- mens and assignment to the correct species, alternative methods using molecular techniques have been developed (cf. Unruh and Woolley, 1999).
Fragments of DNA, once isolated, identified and sequenced, can be applied to the differentiation of species in three ways. First, is the difference in length of amplified gene fragments. Length differences are typical for either of the ITS1 or ITS2 fragments, which can differ even among closely related species (Sappal et al., 1995; Stouthamer et al., 1998). However, fragment length cannot be used as a measure of homology between different taxa, as fragments of similar length can be derived through different evolutionary pathways; thus taxa with vastly dif- ferent sequences can have a similar length. Secondly, sequences compared from different species may exhibit fixed differences for either base changes (mutations) or insertion/deletion events (indels) that serve to differentiate groups (Fig. 3.3). It is essential for these differences to be assessed for different individuals within a population, or, if the comparison is to be made at the species level, amongst
Molecular Systematics 49
different individuals within geographically isolated populations. Thirdly, the identification of sequence differences that are fixed in the groups (populations or species) of interest, such that subsequent identification can be made using specif- ic restriction enzymes that digest a fragment after polymerase chain reaction (PCR) amplification. For example, Fig. 3.3 shows partial sequences of 28S-D2 for eight populations (strains) ofEncarsia luteolaHoward and Encarsia formosaGahan.
The sequences for E. formosa are 592 bases long, whereas those of E. luteolaare 590, with a sequence divergence of 2.8–3.3% between species (Babcock and Heraty, 2000). Both E. luteolaand E. formosahad a within-species sequence diver- sity of 0.2% (three bases each between strains); however, each is fixed for the changes illustrated. Of eight populations of these two species that were exam- ined, two restriction enzymes were found to cut six-base recognition sites unique to each of the species:E. luteola(PvuI) and E. formosa(SalI). When used in combi- nation, the sites discriminate the two species accurately (Fig. 3.3). While sequenc- ing is more accurate for species recognition, the use of restriction enzymes on targeted PCR products can be a rapid and cost-effective means for assessing species under certain conditions, especially if all of the expected species are known. Either approach is considered useful for the accurate identification of these two species of Encarsia, which differ by only minor morphological charac- ters, which are difficult to observe even on slide-mounted specimens (Babcock and Heraty, 2000). Studies using restriction enzymes to separate species focus on Chalcidoidea that are traditionally very difficult to identify, such as Encarsia (Babcock and Heraty, 2000; Schmidt et al., 2001),Aphelinus(Zhu and Greenstone,
50 J. Heraty
Fig. 3.3. Restriction enzyme recognition sites and digests for recognizing two closely related species of Encarsia, E. formosaand E. luteola,using the 635 base pair region of 28S-D2 rDNA (Babcock and Heraty, 2001). PvuI cuts the fragment at the six-base recognition site (CGATAG), resulting in two disproportionate frag- ments, whereas SalI cuts at another site (GTCGAC), resulting in two equal size fragments. (.) indicates sequence identity with E. luteola.
1999; Zhu et al., 2000; Prinsloo et al., 2002; K. Hopper, Delaware, 2002, person- al communication) and various species of Trichogramma(Stouthamer et al., 1998).
De Barro et al. (2000) analysed the sequences of different populations of three species of Eretmocerusfor differences in 28S-D2 and D3, COII, ITS1 and ITS2. Of these five genes, 28S-D3 was fixed at the species level, 28S-D2 was fixed for species with a few changes that differed between but not within popula- tions, and both COII and the ITS regions demonstrated nucleotide variation between individuals within a population. At least for ITS1, an analysis of rela- tionships grouped the various populations into the three distinct groups (100%
bootstrap) that were recognized as morphologically distinct species. ITS1 had low levels of polymorphism within gene copies from a single specimen, which occurred at about the same levels as between individuals at different localities, but even with this demonstrated paralogy, the species grouped appropriately. Similar variation occurs in species of Trichogramma, which can result in multiple ITS2 digest bands for a single individual; however, all individuals possess at least some copies with the species-specific restriction enzyme cut sites (R. Stouthamer, California, 2002, personal communication). For purposes of identification, these examples emphasize the need for sampling of multiple individuals and popula- tions to assure that enzyme recognition sites will be consistently diagnostic.
The application of sequencing as an identification tool for the separation of populations and species of Trichogrammausing ITS1 was first applied by Orrego and Agudelo-Silva (1993). The authors identified two strains within one Californian culture of Trichogramma pretiosum that differed by three base substitu- tions and eight insertion/deletion events. These results were based on the com- parison of only two individuals, making it difficult to determine whether the variation was real or a sequencing artefact. However, they did find a 1.1–4.1%
sequence divergence among four Californian populations (=individuals) as com- pared with 27% with Trichogramma dendrolimi Matsumura from China. RFLP analyses were used on amplified ITS1, ITS2, 28S, and 18S to differentiate Trichogramma minutumRiley,Trichogramma brassicaeBezdenko and Trichogrammanear sibiricumSorokina (Sappal et al., 1995). The ITS sequences showed substantial dif- ferences in both cut sites and length, whereas the 28S fragments were identical.
The 18S fragment had a single cut site for BamHI, which served to differentiate only one of the three species. Subsequent studies of Trichogrammawere focused entirely on sequencing the ITS2 region and developing restriction enzyme assays (van Kan et al., 1996, 1997; Pinto et al., 1997, 2002a; Stouthamer et al., 1998;
Silva et al., 1999; Ciociola et al., 2001a,b). These studies have culminated in the development of dichotomous molecular keys that differentiate species on the basis of fragment length and cuts by specific restriction enzymes (Ciociola et al., 2001a; Pinto et al., 2002a). Beyond the identification of adults, length differences and restriction digests of ITS2 have been applied for the recognition of para- sitism in the eggs ofHelicoverpaby Trichogramma australicum(Amornsak et al., 1998), and species of Aphelinuswithin their aphid host (Zhuet al., 2000).
In Trichogramma, not all species can be differentiated using a single universal gene region. Two species pairs,T. minutum/T. platneri and T. sibericum/T. alpha,
Molecular Systematics 51
could not be differentiated using ITS2 alone (Pinto et al., 2002; Stouthamer et al., 2000a,b). Some base differences and indels were found in disjunct populations of T. minutumand T. platneri, but these were not fixed for either species.T. minutum and T. platneri are reproductively isolated, sympatric in the north-western USA, possess distinct non-overlapping sets of alleles at the phosphoglucomutase (PGM) enzyme locus, and possess very minor morphological differences (Nagarkati, 1975; Pinto et al., 1991, 2002a,b; Pinto, 1999; Stouthamer et al., 2000b; Burks and Heraty, 2002). A subsequent study found two fixed differences in the COI region that would discriminate both species unequivocally (R. Stouthamer, California, 2002, personal communication).
Not all species that express fixed behaviour differences have been shown to have detectable genetic differences. The differential host choice and isolation of populations of Encarsia formosaattacking Bemisiaon poinsettia suggest that they should be genetically distinct from other populations, and yet no fixed differences were found for ITS or in a broader survey of AFLPs in several populations (Y.
Gai and R. Stouthamer, unpublished). A similar case occurs in Encarsia sophia, which exhibit no fixed genetic differences for 28S-D2 between widely separated geographic localities, but populations from Spain and Pakistan exhibit mating incompatibilities and slight morphometric differences (Heraty and Polaszek, 2000; Babcock et al., 2001; Hernández-Suárez et al., 2003). Whether the correct genetic region needs to discovered or whether behavioural differences can accrue at a faster rate than molecular differences remains to be tested. Furthermore, no rules can be applied to correlate the amount of genetic divergence associated with speciation. Some reproductively isolated and partially sympatric species of Trichogrammacan be recognized discretely by only a few bases of COI, and yet populations of Megastigmusdiffering by as much as 4.0% were not interpreted as different species (Fig. 6 of Scheffer and Grissell, 2003). For two sister species of Encarsia,E. luteolaand E. formosa, sequence divergence ranged from 3.0 to 6.1%
for the more conserved 28S-D2 gene (Babcock and Heraty, 2000). As with all tax- onomic information, species boundaries must be determined by a summation of evidence from all sources of data, including geographic, morphological, behav- ioural and genetic.