• Tidak ada hasil yang ditemukan

Ultrafast String matching algorithm - KOCw

N/A
N/A
Protected

Academic year: 2024

Membagikan "Ultrafast String matching algorithm - KOCw"

Copied!
8
0
0

Teks penuh

(1)

Ultrafast String matching algorithm

Lecture 11

(2)

Complexity of Trivial Algorithm

 3,000,000,000 length genome (N)

 300,000,000 reads to map (M)

 Reads are of length 30 (L)

 Number of mismatches allowed is 2 (D).

 Each comparison of match vs. mismatch takes 1/1,000,000 seconds (t).

 Important: Trivial algorithm only solves problem under assumptions.

 Total Time = N*M*L*t = 27,000,000,000,000 seconds

(864,164 years)

(3)

Complexity of Indexing Algorithm

 We need to look up each third of the read in the index.

 For L=30, our index will contain entries of length 10.

Each entry will contain on average N/(4L/3) or 3,000 positions.

 For each position, we need to compute the number of mismatches.

 Our running time is L* M*3*N/(4L/3)*T=81,000,000 seconds (937 days).

 If L=45, then the time is 81,000 seconds or 22.5 hours.

(4)

Indexing a genome

 Perfect match – indexing takes time and space

 sequnece is very very long 3 billion

(5)

Burrows-Wheeler transform

 data compression

 T = a c a a c g

T$

a c a a c g $

Generate all suffixes

$ g $ c g $ a c g $ a a c g $ c a a c g $ a c a a c g $

Fill the rest

$ a c a a c g g $ a c a a c c g $ a c a a a c g $ a c a a a c g $ a c c a a c g $ a a c a a c g $

Sort

$ a c a a c g a a c g $ a c a c a a c g $ a c g $ a c a c a a c g $ a c g $ a c a a g $ a c a a c

BWT(T) Burrows

Wheeler Matrix

(6)

Burrows-Wheeler transform

$ a c a a c g a a c g $ a c a c a a c g $ a c g $ a c a c a a c g $ a c g $ a c a a g $ a c a a c

BWT(T)

 Can we reconstruct the first column with BWT(T)?

$ a c a a c g a a c g $ a c a c a a c g $ a c g $ a c a c a a c g $ a c g $ a c a a g $ a c a a c

?

$ a c a a c g a a c g $ a c a c a a c g $ a c g $ a c a c a a c g $ a c g $ a c a a g $ a c a a c

(7)

Burrows-Wheeler transform

$ a c a a c g a a c g $ a c a c a a c g $ a c g $ a c a c a a c g $ a c g $ a c a a g $ a c a a c

BWT(T)

 Can we reconstruct the first column with BWT(T)?

$ a c a a c g a a c g $ a c a c a a c g $ a c g $ a c a c a a c g $ a c g $ a c a a g $ a c a a c

?

$ a c a a c g a a c g $ a c a c a a c g $ a c g $ a c a c a a c g $ a c g $ a c a a g $ a c a a c

(8)

Burrows-Wheeler transform

g c1

$ a1 a2 a3 c2

$ a1 a2 a3 c1 c2 g

g c1

$ a1 a2 a3 c2

$ a1 a2 a3 c1 c2 g

g c1

$ a1 a2 a3 c2

$ a1 a2 a3 c1 c2 g

T = g T = c2 g T = a3 c2 g

g c1

$ a1 a2 a3 c2

$ a1 a2 a3 c1 c2 g

T = a1 a3 c2 g g

c1

$ a1 a2 a3 c2

$ a1 a2 a3 c1 c2 g

T = c1 a1 a3 c2 g

g c1

$ a1 a2 a3 c2

$ a1 a2 a3 c1 c2 g

T = a2 c1 a1 a3 c2 g

T = a c a a c g

Referensi

Dokumen terkait