This final pass at base calling r<'quircs that the data is normalizcd to an average peak height of I 00, that thc position of thc first insert base is known and that a refcr<'nce spacing graph exists. It also requires the results of the second-pass base calling. The expccted spacing is initially determined by calling best_spacingO on UH' S<'cond- pass base calling iu the r0gion from base 50 to base 150. Thc position returned by best_spacing() is called fbase. nbcnt is the position of ill<' currently called base. When nbcnt gets t.o lw !50 bas('S ahead of fbase, I call best_spac1ng0 011 the third-pass base calling in the region from nbcnt- 150 to nbcnt. comp...nom_spacing() returns the expected spacing according to extrapolations based on the reference spac- ing graph, th<' spacing returned by best_spacingO and the recorded temperature and voltage oft his particular fragment.
I* find_bases3 tries to fix up the preliminary base calling by studying the spacing. The following are required inputs:
fragd->bases is filled in with the result of the second-pass base calling.
fragd->insert_base is filled in with the location of the first insert base.
fragd->dspdata is the data normalized to a mean peak height
*I
of 100
fragd->fildata is the normalized filtered data.
Results returned in freag->bases1.
#define INSERT_THRESH (1.60) find_bases3(fragd)
Fragdata *fragd;
{
int i,j,k,bcnt,bpos,excess,color,tscan,maxcolor,fbase;
float x,y,z,scan1,scan2,nomsp,avgval,maxval,fspacing;
float *dspdata[4],*fildata[4];
Basedata *bases,*nbases,*abases,tmpbases[MAX_BASES];
int nbcnt,*abcnt;
abases=fragd->fraux->bases2;
nbases=malloc(MAX_BASES*sizeof(Basedata));
abcnt=&(fragd->fraux->baselen2);
I*
load_nom_spacings() loads a global array with spacing information derived from the universal spacing graph adjusted according to temperature and voltage information recorded for this particular fragment. This array will be used later by the routine comp_non_spacing().*I
load_nom_spacings(nomspacings);
bases=&(fragd->bases[O]);
bcnt=fragd->basecnt;
for(i=O; i<4; i++){
}
dspdata[i]=fragd->dspdata+(i*fragd->vnurn);
fildata[i]=fragd->fildata+(i*fragd->vnurn);
I*
first, remove double peaks, I .e. peaks of the same colorseparated by less than 0.5* nomsp, and replace them by one peak between the two.
*I
for(i=O; i<fragd->insert_base; i++){
tmpbases[i] .base=bases[i] .base;
tmpbases[i] .scan=bases[i] .scan;
}
nbcnt=i;
I*
find the best spot between bases 50 and 150NOTE THAT FBASE IS RELATIVE TO FRAGD->INSERT_BASE THIS MAKES FBASE COMPATIBLE WITH COMP_NOM_SPACING
*I
best_spacing(bases+fragd->insert_base,50,150,&fspacing,&fbase);
f*
record the computed spacing at base 100 as fragd->nom_sp*f
nomsp=comp_nom_spacing(nomspacings,fragd,fbase,fspacing, bases[lOO+i] .scan,100);
fragd->nom_sp=nomsp;
for(bpos=i; bpos<bcnt-1; bpos++){
}
if(nbcnt'l.10==0){
x=comp_nom_spacing(nomspacings,fragd,fbase,fspacing,
bases[bpos] .scan,nbcnt-fragd->insert_base);
if(x>l.O){nomsp=x;}/* a reality check, should be an error
*f
}
/*this gets rid of double-called bases.
I.e. less than O.S*nomsp apart
*f
if(bases[bpos] .base==bases[bpos+l] .base &&
((bases[bpos+l] .scan-bases[bpos] .scan)<(O.S*nomsp))){
f*
a winner!*f
tmpbases[nbcnt].base=bases[bpos].base;
tmpbases[nbcnt].scan=
(O.S*(bases[bpos+l] .scan+bases[bpos] .scan))+O.S;
nbcnt++;
bpos++;
}else{
}
tmpbases[nbcnt] .base=bases[bpos].base;
tmpbases[nbcnt] .scan=bases[bpos].scan;
nbcnt++;
if(bpos<bcnt){
tmpbases[nbcnt].base=bases[bpos] .base;
tmpbases[nbcnt] .scan=bases[bpos].scan;
nbcnt++; }
bcnt=nbcnt;
f*
ignore the first 100 bases for now, the peak information is probably a better set of criteria than spacing information there, anyway.*f
for(i=O; i<100+fragd->insert_base; i++){
nbases[i] .base=tmpbases[i] .base;
nbases[i].scan=tmpbases[i].scan;
}
nbcnt=i;
for(bpos=i; bpos<bcnt; bpos++){
f* i f were getting far from the last fbase, recompute fspacing and fbase from the last 150 bases *f
j=nbcnt-fragd->insert_base;
f*
j is now relative to insert*f
if(j>fbase+50){
x=fspacing;
best_spacing(nbases+fragd->insert_base, MAX(j-150,0),
j-1,&fspacing,&fbase);
}
f*
compute the expected spacing here*I
scan1=tmpbases[bpos-1] .scan;
scan2=tmpbases[bpos].scan;
z=scan2-scan1;
x=comp_nom_spacing(nomspacings,fragd,fbase,fspacing, (int)scan2,nbcnt-fragd->insert_base);
if(x<1.0){1* this is an error condition
*I
printf("NOMSP='l.f bpos='l.d nbcnt='l.d fbase='l.d fspacing='l.f\n", nomsp,bpos,nbcnt,fbase,fspacing);
}else{
nomsp=x;
excess=(z/nomsp)+1.0-INSERT_THRESH;
if(excess>=1){/* candidate for multiple insertions *f y=z/(excess+1.0);
f*
fill in the interval with the base with the highest value in dspdata*I
for(i=1; i<=excess; i++){
tscan=scan1+(i*y)+0.5;
maxval= -10000.0;
for(color=O; color<4; color++){
avgval=(dspdata[color] [tscan-1]+
dspdata[color][tscan]+
dspdata[color] [tscan+1])/3.0;
if(avgval>maxval){
} }
maxval=avgval;
maxcolor=color;
nbases[nbcnt] .base=maxcolor;
nbases[nbcnt] .scan=tscan;
} }
nbcnt++; }
} }
nbases[nbcnt].base=tmpbases[bpos].base;
nbases[nbcnt].scan=tmpbases[bpos].scan;
nbcnt++;
fragd->basesl=nbases;
fragd->basecntl=nbcnt;