• Tidak ada hasil yang ditemukan

Quantitating Peaks

The final phas<' of this process is to quantitate th<' J)('aks r<'presC'nt ing the call<>d bases. What I am afl<'r is a measure of the dcoxy /didc·oxy incorporation ratio. This is proportional to the munb<>r of fragment molecules of a given size generated by the sequencing reaction. This in turn is equal to the number of dye moieties in a band, which is presumably proportional to the fluoresn•ncc signal picked up by the photomultiplier Lube, integrated over the <'ntire hand. lclcally, on<' would want io m<'asur<' thc peak ar<'a OV<'r the entir<' band. In practin' this is complicated by small spurious p<'aks of unknown origin. It is not uncommon for tlwre to be three peaks, a real one and two spurious ones, within 12 scans. At.tc•mpting to fit :3 gaussians, C'ach with :3 free parameters to 12 noisy points turned out to be a mislake. I discovNcd that trying to fit gaussians to this data is not a robust procc•dure, the r<'s11lis were often ciC'arly wrong.

I only need the peak areas relative to the surrounding peaks, sincf' I'm gorng to normalize with respect to them anyway to get rid of gradual changes in signal

1.10 1.05 1.00 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55

0.50 0.45 0.40

0.35 0.30 0.25

0.20 0.15

0.10 0.05

0.00 -0.05

..I.

.

7 ·~

,_

I

1--· ~ ~

\

~ - \. -

-

-

\ -

Ytitt . 1

II

.II .,,,1' -

~l,.

- - - -

- - - -

- - -

- - -

- ~ ll' '

.

.

.

:

' :

..

. . . .

. . . .

:

..

.

M~::

. .

; ~I~ :I:

~---~

. .

~,

.. 'l~~ r:~

' : . ~ t~~:

·

::5::.. -

\ Jt.o,4~,A~.I' ..,.., ·' :;

~

·".

~, ~.r I . , , ·.s

!Wl

~ ... ~ .. (14""' ... !i

I ::

•!

_ . _

- -

- ' - - - · -

0.00 200.00 400.00 600.00 800.00

Avg

siiioc~··

Base PosJIJOn

Figure 6.:3: Graph of peak-to-peak spacing vs. base position. The spacing is normal- ized Lo l.O at base 100. This data was averaged ovC'r 629 fragments from the 2-47 cosmic!. C'orrecLions for variations in electrophoresis voltage and tcmp<'ratur<' have bc><'n made>. Base counts arc relative' Lo the Smal siLe. Data aftN base 600 is suspcd due to the> high numbc>r of errors that occur in sequencing that far out.

strength. So 1 mak<' the assumption that the peak width changes slowly enough that I can usc the peak height as a surrogate for the area. ln practice, I averag<' th<' h<'ight owr the 3 scans c<'ntN<'d on the ])('ak.

One of the criteria used in the base calling is peak h<•ight; peaks that are too low ar<' rej<?ctcd as real base peaks. 'rhis pot<'ntially introduces a bias in the data, low peaks will be culled out. In order to corr<'ci for this

r

use my knowledge of the true sequence. If my base calling has left a gap (according to the consensus a.lignm<'nt) and if tbc spacing of th<' gap is consistent with the idea that a bas<' is missing and if thC' 2 bases on hoth sid<'s of t.he gap were called correctly, I assume there' r<'ally is a bas<' in the middle of th<:' gap.

f*

a new structure for storing all the info about quantitated peaks

*f

typedef struct _pkinfo{

char base;

f*

the consensus base for this peak

*f

int called;

f*

bool. 1 if correctly called in fragment

*f

int fcnt;

f*

base pos in fragment. Rel to fragd->true_base

*f

int conpos;

f*

pos in consensus, neg implies comp strand

*f

}Pkinfo;

Pkval pkvals[1000] ;

f*

place to put the 3-point sum stats

*f

int pkvalcnt;

Gfit_cmd(fragd) Fragdata *fragd;

{

int i,j,k,l,ent,seqlen,useit,color,evalcnt,called,vnum;

float x,y,z;

float **bldata,*bldata_[4];

char entname[100],seq[2000],con[2000];

int sptr,cptr;

Fragdata *fragd;

Basedata *fbases,tbases[10];

Pkinfo pks[2000];

int pkcnt;

vnum=fragd->vnum;

fbases=fragd->bases1;

decode_align(fragd,seq,con,&seqlen);

bldata_[O]=(float *)Mymalloc(4*vnwn*sizeof(float),"BLDATA_TMP");

for(i=1; i<4; i++){

bldata_[i]=bldata[O]+(i*vnum);

}

bldata=fragd->dspdata;

f* now that we have the alignment decoded, we can chose peaks to quantitate.

The selection rules are:

1) if a correct peak has 2 correct peaks on both sides, use it.

2) if an uncalled peak has 2 good peaks on both sides & the spacing is reasonable, use it.

*I

sptr=fragd->true_base+fragd->fstart; f* sptr keeps track of where in the fragment sequence i-2 is *f

cptr=fragd->cstart; f* cptr keeps track of the consensus (i-2) *f pkcnt=O; pkvalcnt=O;

for(i=2; i<MIN(fragd->alignlen-2,600); i++){

useit=O;

if(i>=2 && seq[i-2]==con[i-2] && seq[i-l]==con[i-1] &&

seq[i+1]==con[i+1] && seq[i+2]==con[i+2] &&

(seq[i]==con[i] II seq[i]=='-')H/* a possibility *f for(j=O; j<2; j++){

}

tbases[j] .base=fbases[sptr+j] .base;

tbases[j] .scan=fbases[sptr+j] .scan;

if(seq[i]==con[i]){/* the easy case *I called=!;

for(j=O; j<3; j++){

tbases[j+2].base=fbases[sptr+j+2].base;

tbases[j+2] .scan=fbases[sptr+j+2] .scan;

}

useit=1;

I*

a winner

*I

}else{/* we know seq[i]=='-'. Here we have to look at the spacing *I

for(j=O; j<2; j++){

}

I*

remember the offset going across seq[i]

*I

tbases[j+3] .base=fbases[sptr+j+2].base;

tbases[j+3] .scan=fbases[sptr+j+2] .scan;

x=(tbases[4] .scan-tbases[O] .scan)/4.0;

I*

x=avg spacing over all 5

*I

}

}

}

}

y=(tbases[3] .scan-tbases[1] .scan)l2.0;

I*

y=avg gap spacing

*I

if(xly<1.25 && xly>0.75){

I*

if the avg gap spacing is within range use it

*I

tbases[2] .scan=(tbases[1] .scan+tbases[3].scan)l2.0;

tbases[2].base=base_from_char(con[i]);

called=O;

useit=1;

}

cptr++; sptr++;

if(seq[i-2]=='-'){sptr--;}

if(con[i-2]=='-'){cptr--;}

if(useit){l* enter its stats into the list

*I

pkvals[pkvalcnt].base=tbases[2].base;

}

pkvals[pkvalcnt] .scan=tbases[2] .scan;

pkvals[pkvalcnt] .basenum=cptr+1;

pkvals[pkvalcnt] .called=called;

k=tbases[2] .scan;

color=tbases[2] .base;

pkvals[pkvalcnt].val=(bldata[color][k-1]+bldata[color][k]+

bldata[color] [k+1])13.0;

pkvalcnt++;

fragd->pkvalO=pkvals;

fragd->pkvalO_cnt=pkvalcnt;

Chapter 7

Dokumen terkait