• Tidak ada hasil yang ditemukan

Distributions on bicoloured evolutionary trees : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mathematics at Massey University

N/A
N/A
Protected

Academic year: 2024

Membagikan "Distributions on bicoloured evolutionary trees : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mathematics at Massey University"

Copied!
7
0
0

Teks penuh

(1)

Copyright is owned by the Author of the thesis. Permission is given for

a copy to be downloaded by an individual for the purpose of research and

private study only. The thesis may not be reproduced elsewhere without

the permission of the Author.

(2)

DISTRIBUTIONS ON BICOLOURED EVOLUTIONARY TREES

A thesis p resented in partial fulfilment of the requirements for the degree of

Doctor of Philosophy in Mathematics at

Massey University

M ichael Anthony Steel February, 1 989

(3)

Massey University Library

Tresis Copyright Fonn

Title of thesis:

0\ �\f._\ \30----T

\ Q� �

t: JCR.-

\T\1 oN AfZ

y

( 1) (a) I give pennission for my thesis to be made available to readers in the Massey University Library under conditions detennined by the Librarian.

(b) �±slr my Uie--.sTsw-�Ifi::ille--ava-ilable-t..o�

---;rE'4:e�aiE�IOat ffi1 writt--ett-eeBSeffto=W:=t' ==::===::==,...,-- (2) (a)

(3) (a)

���-

r

I agree that my thesis, or a copy, may be sent to another institution under conditions detennined by the Librarian.

I agree that my thesis may be copied for Library use.

(b) ... ----:

:-0

-GrcG

Het-'�· 51.�.���-��i�st�oh�? ::=::co�p�;�· e::dfo�r�Ii:Eb�s:eaf:r ¥:.¥=''s§!e�

w l-\C5

1/>fq

Signed Date

The copyright of this thesis belongs to the author. Readers must sign their name in the space below to show that they recognise

this. They are asked to add their pennanent address.

c(- ��L tte.J

klo (

A- f_p

5

fd�� J..k-4

.IJ7_

DATE

1/J! 21

MASSEY UNIVERSIT'f

LI�Eh&-'G

(4)

ABSTRACT

A central and challenging problem in contemporary biology is how to accurately reconstruct evol utionary trees from DNA sequence data.

This thesis addresses three themes fro m this endeavour -- comparison , consistency and confidence i ntervals -- by analysing distributions arising from phylogenetic trees.

Toward the first theme, the d istribution of the sym metric d ifference

metric o n pairs of binary and phylogenetic trees is studied , and a n u mber of n ew resu lts o btained. These theorems, as well as a result o n another tree metric answer previous conjectures i n this area. Also u nder the theme of comparison , we analyse distributions on bicoloured trees arising from the p rinciple of parsimony. A streaml ined proof is given of an

eleg ant theorem which allows an efficient com parison of how m uch better a m axim u m parsi mony tree fits given data than a randomly-chosen tree. A dual distri bution , where the tree is fixed and the data varies is also

analysed, answering a recent u nsolved problem.

We then consider the theoretical accuracy of tree-bu i lding methods, concentrating on the statistical property of consistency. Under a simple stochastic model on bicoloured trees, conditio ns for the consistency of frequently-used methods based o n parsimony and compatibility are examined. lt is shown that even i n "best possible" conditions both methods can be inconsistent, though a strong sufficient condition for compatibility is give n . The analysis is extended for a molecular clock.

Finally, p rocedu res are described for placing confidence i ntervals aro u nd phylogenies, and limitations on the sort of confidence intervals possible are given . Ways to efficiently implement these procedu res are the n considered -- in particular, approximate methods, applications t o sets of taxa of size fou r, and simplifications u nder a molecu lar clock.

(5)

The rate that sequence data must grow as a fu nction of the n u mber of taxa for co nfide nce inte rvals to converge to a single tree is also considered.

The arg u ments i n this thesis are primarily combinatorial and stochastic.

I n the hope that their i mpl ications wil l also i nterest biologists, som e

space h as been given t o motivating and explaining t h e biological relevance of the resu lts presented.

t/ � ��

�---

�r

___->� I

���

� \ ��

Comb i nat o ri al

'---

Stochas t i c

Chart i l l u strati n g t h e f l ow of re s u l ts f rom on e s e cti on to anoth e r

(6)

Introduction

Notation/Table of Symbols

--trees

--resolved subtrees --forests

--bicoloured trees Section Two:

IQ)�®�IF�ibi\Jl�O@Ifil ©� �h® SWMM®�IFO© IQ)on®IF®Ifil©® IMI®�rro©

--distribution generating functions --distribution on pairs of trees

--absolute (non-asymptotic) inequalities --asymptotic range of the distribution --mon oton icity

--description of the metric from below --distribution on PT(n)

--co mparison with other metrics

--induced subtrees and minimal similarity --span n i ng sets

--consensus trees --efficiency

--constraints with two colours --lower bou nds on i nformation loss

--vector spaces of edge sets

--(weakly-) con necting trees and forests --distributions arising fro m parsimony --path/edge dual ity

3

5 I I 1 5 1 8

29 34 38 43 46 48 5 1 54

56 62 66 68

73 76

82 84 90 1 00

(7)

--stochastic p reliminaries --the model

--central o bservations --invariants

--partitio n probabi lities

Section Seven: CC©rt1J�ij��®rt1l©W --selection procedu res

--convergence and consistency

--consistent recovery of trees from dissimilarities --consistency of parsimony and compatibility --sufficient conditions

--consistency u nder a molecular clock

--confidence inte rvals --approximate methods --exact solutions

--efficiency (I) --efficiency (11)

--a X2 test (molecu lar clock) Appen d ix

References

"108

"109 i i i i "13 i i 8

"127

"127

"132

"135

"147

"160

"168

"172

"176

"178

"180

"186

"193

"196

Referensi

Dokumen terkait

[Table of contents] CHAPTER 10 CHAPTER 11 APPENDIX A APPENDIX B APPENDIX C APPENDIX D APPENDIX E APPENDIX F APPENDIX G APPENDIX H APPENDIX I 616LIQGRAEHY 10.1

Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only.. The thesis may not be reproduced elsewhere without the

M3STRACT ii Kovesi maintains � th&� �he key to und erstanding a term is t o be found not in emp�rical similarities among observable things and events but in the human needs and

3 .l 4.3.2 INTRODUCTION METHODS Alcohol measurement by gas chromatography Alcohol measurement in vitro Partition c0�fficients Respiration measur�mcnts Breath alcohol levels in

Amino acid analyzer chart showing the relath-e concentrat ion of amir o ac i d s in acid hydrolyzcJ NPN fract ion of LHRSM 7... Release of pro te inase from intact cells of

The results of this research showed firstly, that few New Zealand practitioners considered self-assessment could be a useful part of their selection procedures, secondly, the validity

The thesis may not be reproduced elsewhere without the permission of the

It addresses two main problem areas of this research project • The development of a customisable user interface which utilises an abstract notation definition language.. • Support of