Struktur Data & Algoritme
(
Data Structures & Algorithms
)
Denny ([email protected])
Suryana Setiawan ([email protected])
Fakultas Ilmu Komputer Universitas Indonesia Semester Genap - 2004/2005
Version 2.0 - Internal Use Only
Analisa Algoritme
Algoritme
 al•go•rithm n.
1 Math. a) any systematic method of solving a certain kind of problem b) the repetitive calculations used in finding the greatest common divisor of two numbers (called in full Euclidean algorithm)
2 Comput. a predetermined set of instructions for solving a specific problem in a limited number of steps  Suatu set instruksi yang harus diikuti oleh computer
untuk memecahkan suatu masalah.
 Program harus berhenti dalam batas waktu yang wajar (reasonable)
 Tidak terikat pada programming language atau
SDA/ALG/V2.0/3
Analisa Algoritme: motivasi
 perlu diketahui berapa banyak resource (time& space) yang diperlukan oleh sebuah algoritme
 Menggunakan teknik-teknik untuk mengurangi waktu yang dibutuhkan oleh sebuah algoritme
Expected Outcome
 mahasiswa dapat memperkirakan waktu yang dibutuhkan sebuah algoritme
 mahasiswa memahami teknik-teknik yang secara drastis menurunkan running time
 mahasiswa memahami kerangka matematik yang menggambarkan running time
SDA/ALG/V2.0/5
Outline
 Apa itu analisa algoritme?- what
 Bagaimana cara untuk analisa/mengukur? - how  Notasi Big-Oh
 Case Study: The Maximum Contiguous Subsequence Sum Problem
 Algorithm 1: A cubic algorithm
 Algorithm 2: A quadratic algorithm
 Algorithm 3: An N log N algorithm
 Algorithm 4: A linear algorithm
Analisa Algoritme: what?
 Mengukur jumlah sumber daya (timedanspace) yang diperlukan oleh sebuah algoritme
 Waktu yang diperlukan (running time) oleh sebuah algoritme cenderungtergantungpadajumlah input
yang diproses.
ÎRunning timedari sebuah algoritme adalahfungsi dari jumlah inputnya
 Selalutidak terikatpadaplatform (mesin + OS),
bahasa pemrograman,kualitas kompilatoratau
bahkanparadigma pemrograman(mis. Procedural vs Object-Oriented)
SDA/ALG/V2.0/7
Analisa Algoritme: how?
 Bagaimana jika kita menggunakan jam?
 Jumlah waktu yang digunakanbervariasitergantung pada beberapa faktor lain: kecepatan mesin, sistem operasi (multi-tasking), kualitas kompiler, dan bahasa pemrograman.
 Sehingga kurang memberikan gambaran yang tepat tentang algoritme Wall-Clock time CPU time Process 1: Process 2: Process 3: Idle: Wall-Clock time
Analisa Algoritme: how?
 Notasi O (sering disebut sebagai “notasi big-Oh”)  Digunakan sebagai bahasa untuk membahas efisiensi
dari sebuah algoritme: log n, linier, n log n, n2, n3, ...  Dari hasil run-time, dapat kita buat grafik dari waktu
SDA/ALG/V2.0/9
Analisa Algoritme: how?
 Contoh: Mencari elemen terkecil dalam sebuah array  Algoritme: sequential scan / linear search
 Orde-nya: O(n) – linear
public static int smallest (int a[]) { // assert (length of array > 0);
int elemenTerkecil = a[0];
for (ii = 1; ii < a.length; ii++) { if (a[ii] < elemenTerkecil) { elemenTerkecil = a[ii]; } } return elemenTerkecil ; }
Analisa Algoritme: how?
average case input size running time n best case worst case
SDA/ALG/V2.0/11
Big Oh: contoh
 Sebuah fungsi kubic adalah sebuah fungsi yang suku dominannya (dominant term) adalah sebuah konstan dikalikan dengann3.  Contoh:  10 n3+ n2+ 40 n + 80  n3+ 1000 n2+ 40 n + 80  n3+ 10000
Dominant Term
 Mengapa hanya suku yang memiliki pangkat tertinggi/dominan saja yang diperhatikan?  Untuknyang besar, suku dominan lebih
mengindikasikan perilaku dari algoritme.  Untuknyang kecil, suku dominan tidak selalu
mengindikasikan perilakunya, tetapiprogram dengan input kecil umumnya berjalan sangat cepat sehingga kita tidak perlu perhatikan.
SDA/ALG/V2.0/13
Big Oh: issues
 Apakah fungsi linier selalu lebih kecil dari fungsi kubic?
 Untuk n yang kecil, bisa saja fungsi linier > fungsi kubic
 Tetapi untuk n yang besar, fungsi kubic > fungsi linier
 Contoh: • f(n) = 10 n3+ 20 n + 10 • g(n) = 10000 n + 10000 • n = 10, f(n) = 10.210, g(n) = 110.000 Î f(n) < g(n) • n = 100, f(n) = 10.002.010, g(n) = 1.010.000 Î f(n) > g(n)
 Mengapa nilai konstan/koefesien pada setiap suku tidak diperhatikan?
 Nilai konstan/koefesien tidak berarti pada mesin yang berbeda ∴ Big Oh digunakan untuk merepresentasikanlaju pertumbuhan
(growth rate)
Orde Fungsi Running-Time
Function Name
c Constant
log N
Logarithmic
log
2N
Log-squared
N Linear
N log N
N log N
N
2Quadratic
N
3Cubic
2
NExponential
N! Factorial
SDA/ALG/V2.0/15
Fungsi Running-Time
 Untuk input yang sedikit, beberapa fungsi lebih cepat dibandingkan dengan yang lain.
Fungsi Running-Time (2)
 Untuk input yang besar, beberapa fungsi running-time sangat lambat - tidak berguna.
SDA/ALG/V2.0/17
Contoh Algoritme
 Mencari dua titik yang memiliki jarak terpendek dalam sebuah bidang (koordinat X-Y)
 Masalah dalam komputer grafis.
 Algoritme brute force:
•hitung jarak dari semua pasangan titik •cari jarak yang terpendek
 Jika jumlah titik adalah n, maka jumlah semua pasangan adalah n * (n - 1) / 2
 Orde-nya: O(n2) - kuadratik
 Ada solusi yang O(n log n) bahkan O(n) dengan cara algoritme lain.
Contoh Algoritme
 Tentukan apakah ada tiga titik dalam sebuah bidang yang segaris (colinier).
 Algoritme brute force:
•periksa semua pasangan titik yang terdiri dari 3 titik.
 Jumlah pasangan: n * (n - 1) * (n - 2) / 6
 Orde-nya: O(n3) - kubik. Sangat tidak berguna untuk
10.000 titik.
SDA/ALG/V2.0/19
Studi Kasus
 Mengamati sebuah masalah dengan beberapa solusi.  Masalah Maximum Contiguous Subsequence Sum
 Diberikan (angka integer negatif dimungkinkan) A1, A2, …, AN, cari nilai maksimum dari (Ai+ Ai+1 + …+ Aj).
 maximum contiguous subsequence sum adalah nol jika semua integer adalah negatif.
 Contoh (maximum subsequences digarisbawahi)  -2, 11, -4, 13, -4, 2
 1, -3, 4, -2, -1, 6
Brute Force Algorithm (1)
 Algoritme: Hitung jumlah dari semua sub-sequence yang mungkin
 Cari nilai maksimumnya
 Contoh:
 jumlah subsequence (start, end)
•(0, 0), (0,1), (0,2), …, (0,5) •(1,1), (1,2), …, (1, 5) •… •(5,5) 2 -4 13 -4 11 -2 5 4 3 2 1 0
SDA/ALG/V2.0/21 public static int maxSubSum1( int [] A )
{
int maxSum = 0;
for(int ii = 0; ii < A.length; ii++) { for( int jj = ii; jj < A.length; jj++) {
int thisSum= 0;
for(int kk = ii; kk <= jj; kk++) thisSum += A[kk];
if( thisSum > maxSum ) maxSum = thisSum; }
}
return maxSum ; }
Brute Force Algorithm (2)
2 -4 13 -4 11 -2 ii jj
Analysis
 Loop of size N inside of loop of size N inside of loop of size N means O(N3), or cubic algorithm.
 Slight over-estimate that results from some loops being of size less than N is not important.
SDA/ALG/V2.0/23
Actual Running Time
 For N = 100, actual time is 0.47 seconds on a particular computer.
 Can use this to estimate time for larger inputs: T(N) = cN3
T(10N) = c(10N)3= 1000cN3= 1000T(N)
 Inputs size increases by a factor of 10 means that running time increases by a factor of 1,000.
 For N = 1000, estimate an actual time of 470 seconds. (Actual was 449 seconds).
 For N = 10,000, estimate 449000 seconds (6 days).
How To Improve?
 Remove a loop; not always possible.
 Here it is: innermost loop is unnecessary because it throws away information.
 thisSumfor next jj is easily obtained from old value of thisSum:
 Need Aii+ A ii+1+ … + A jj-1+ Ajj
 Just computed Aii+A ii+1+ …+ A jj-1
 What we need is what we just computed + Ajj
 In other words:
SDA/ALG/V2.0/25
The Better Algorithm
public static int maxSubSum2( int [ ] A ) {
int maxSum = 0;
for( int ii = 0; ii < A.length; ii++ ) {
int thisSum = 0;
for( int jj = ii; jj < A.length; jj++ ) {
thisSum += A[jj];
if( thisSum > maxSum ) maxSum = thisSum; } } return maxSum ; }
Analysis
 Same logic as before: now the running time is quadratic, or O(N2)
 As we will see, this algorithm is still usable for inputs in the tens of thousands.
 Recall that the cubic algorithm was not practical for this amount of input.
SDA/ALG/V2.0/27
Actual running time
 For N = 100, actual time is 0.011 seconds on the same particular computer.
 Can use this to estimate time for larger inputs: T(N) = cN2
T(10N) = c(10N)2= 100cN2= 100T(N)
 Inputs size increases by a factor of 10 means that running time increases by a factor of 100.
 For N = 1000, estimate a running time of 1.11 seconds. (Actual was 1.12 seconds).
 For N = 10,000, estimate 111 seconds (= actual).
Recursive Algorithm
 Use a divide-and-conquerapproach.
 Membagi (divide) permasalahan ke dalam bagian yang lebih kecil
 Menyelesaikan (conquer) masalah per bagian secara recursive
 Menggabung penyelesaian per bagian menjadi solusi masalah awal
SDA/ALG/V2.0/29
Recursive Algorithm (2)
 The maximum subsequence either  lies entirely in the first half
 lies entirely in the second half
 starts somewhere in the first half, goes to the last element in the first half, continues at the first element in the second half, ends somewhere in the second half.
 Compute all three possibilities, and use the maximum.
 First two possibilities easily computed recursively.
Computing the Third Case
 Easily done with two loops; see the code For maximum sum that starts in the first half and extends to the last element in the first half, use a right-to-left scan starting at the last element in the first half.
 For the other maximum sum, do a left-to-right scan, starting at the first element in the first half.
4 -3 5 -2 -1 2 6 -2
SDA/ALG/V2.0/31
Recursion version
static private int
maxSumRec (int[] a, int left, int right) {
int center = (left + right) / 2;
if(left == right) { // Base case return a[left] > 0 ? a[left] : 0; }
int maxLeftSum = maxSumRec (a, left, center); int maxRightSum = maxSumRec (a, center+1, right);
for(int ii = center; ii >= left; ii--) { leftBorderSum += a[ii]; if(leftBorderSum > maxLeftBorderSum) maxLeftBorderSum = leftBorderSum; } ...
Recursion Version
...for(int jj = center + 1; jj <= right; jj++) { rightBorderSum += a[jj];
if(rightBorderSum > maxRightBorderSum) maxRightBorderSum = rightBorderSum; }
return max3 (maxLeftSum, maxRightSum,
maxLeftBorderSum + maxRightBoderSum); }
// publicly visible routine (a.k.a driver function) static public int maxSubSum (int [] a)
{
return maxSumRec (a, 0, a.length-1); }
SDA/ALG/V2.0/33
Coding Details
 The code is more involved
 Make sure you have a base case that handles zero-element arrays.
 Use a public static driver with a private recursive routine.
 Recursion rules:  Have a base case
 Make progress to the base case
 Assume the recursive calls work
 Avoid computing the same solution twice
Analysis
 Let T( N ) = the time for an algorithm to solve a problem of size N.
 Then T( 1 ) = 1 (1 will be the quantum time unit; remember that constants don't matter).
 T( N ) = 2 T( N / 2 ) + N
 Two recursive calls, each of size N / 2. The time to solve each recursive call is T( N / 2 ) by the above definition
 Case three takes O( N ) time; we use N, because we will throw out the constants eventually.
SDA/ALG/V2.0/35
Bottom Line
T(1) = 1 = 1 * 1 T(2) = 2 * T(1) + 2 = 4 = 2 * 2 T(4) = 2 * T(2) + 4 = 12 = 4 * 3 T(8) = 2 * T(4) + 8 = 32 = 8 * 4 T(16) = 2 * T(8) + 16 = 80 = 16 * 5 T(32) = 2 * T(16) + 32 = 192 = 32 * 6 T(64) = 2 * T(32) + 64 = 448 = 64 * 7T(N) = N(1 + log N) = N + N log N = O(N log N)
N log N
 Any recursive algorithm that solves two half-sized problems and does linear non-recursive work to combine/split these solutions will always take O( N log N ) time because the above analysis will always hold.
 This is a very significant improvement over quadratic.  It is still not as good as O( N ), but is not that far away
either. There is a linear-time algorithm for this problem. The running time is clear, but the correctness is non-trivial.
SDA/ALG/V2.0/37
Linear Algorithms
 Linear algorithm would be best.  Remember: linear means O( N ).
 Running time is proportional to amount of input. Hard to do better for an algorithm.
 If input increases by a factor of ten, then so does running time.
Idea
 Let Ai,jbe any sequence with Si,j < 0. If q > j,
then Ai,qis not the maximum contiguous
subsequence.  Proof:
 Si,q = Si,j + Sj+1,q  Si,j < 0Î Si,q < Sj+1,q
SDA/ALG/V2.0/39
The Code - version 1
static public int
maximumSubSequenceSum3 (int a[]) {
int maxSum = 0; int thisSum = 0;
for (int jj = 0; jj < a.length; jj++) { thisSum += a[jj]; if (thisSum > maxSum) { maxSum = thisSum; } else if (thisSum < 0) { thisSum = 0; } } return maxSum; }
The Code - version 2
static public intmaximumSubSequenceSum3b (int data[]) {
int thisSum = 0, maxSum = 0;
for (int ii = 0; ii < data.length; ii++) { thisSum = Max(0,
data[ii] + thisSum);
maxSum = Max(thisSum, maxSum); }
return maxSum; }
SDA/ALG/V2.0/41
Running Time
Running Time: on different machines
 CubicAlgorithm on Alpha 21164 at 533 Mhzusing C compiler
 LinearAlgorithm on Radio Shack TRS-80 Model III
(a 1980 personal computer with a Z-80 processor running at 2.03 Mhz) using interpreted Basic
n Alpha 21164A, C compiled, Cubic Algorithm TRS-80, Basic interpreted, Linear Algorithm 10 0.6 microsecs 200 milisecs 100 0.6 milisecs 2.0 secs 1,000 0.6 secs 20 secs 10,000 10 mins 3.2 mins 100,000 7 days 32 mins 1,000,000 19 yrs 5.4 hrs
SDA/ALG/V2.0/43
Running Time: moral story
 Even the most clever programming tricks cannot make an inefficient algorithm fast.
 Before we waste effort attempting to optimize code, we need to optimize the algorithm.
The Logarithm Algorithm
 Formal Definition of Logarithm For any B, N > 0, logBN = K, if BK= N.
 If (the base) B is omitted, it defaults to 2 in computer science.
 Examples:
 log 32 = 5 (because 25= 32)  log 1024 = 10
 log 1048576 = 20
 log 1 billion = about 30
 The logarithm grows much more slowly than N, and slower than the square root of N.
SDA/ALG/V2.0/45
Examples of the Logarithm
 BITS IN A BINARY NUMBER
 How many bits are required to represent N consecutive integers?
 REPEATED DOUBLING
 Starting from X = 1, how many times should X be doubled before it is at least as large as N?
 REPEATED HALVING
 Starting from X = N, if N is repeatedly halved, how many iterations must be applied to make N smaller than or equal to 1 (Halving rounds up).
 Answer to all of the above is log N (rounded up).
Why log N?
 B bits represents 2Bintegers. Thus 2Bis at least as
big as N, so B is at least log N. Since B must be an integer, round up if needed.
SDA/ALG/V2.0/47
Repeated Halving Principle
 An algorithm is O( log N ) if it takes constant time to reduce the problem size by a constant fraction (which is usually 1/2).
 Reason: there will be log Niterations of constant work.
Linear Search
 Given an integer X and an array A, return the position of X in A or an indication that it is not present. If X occurs more than once, return any occurrence. The array A is not altered.
 If input array is not sorted, solution is to use a linear search. Running times:
 Unsuccessful search: O( N ); every item is examined
 Successful search:
•Worst case: O( N ); every item is examined
•Average case: O( N ); half the items are examined
SDA/ALG/V2.0/49
Binary Search
 Yes! Use a binary search.  Look in the middle
 Case 1: If X is less than the item in the middle, then look in the sub-array to the left of the middle
 Case 2: If X is greater than the item in the middle, then look in the sub-array to the right of the middle
 Case 3: If X is equal to the item in the middle, then we have a match
 Base Case: If the sub-array is empty, X is not found.
 This is logarithmic by the repeated halving principle.  The first binary searchwas published in 1946. The
first published binary search without bugsdid not appear until 1962.
/**
Performs the standard binary search using two comparisons per level.
@param a the array @param x the key
@exception ItemNotFound if appropriate. @return index where item is found. */
public static int binarySearch (Comparable [ ] a, Comparable x ) throws ItemNotFound
{
int low = 0;
int high = a.length - 1; int mid;
while( low <= high ) {
mid = (low + high) / 2;
if (a[mid].compareTo (x) < 0) { low = mid + 1; } else if (a[mid].compareTo (x) > 0) { high = mid - 1; } else { return mid; } }
SDA/ALG/V2.0/51
Binary Search
 Can do one comparison per iteration instead of two by changing the base case.
 Save the value returned by a[mid].compareTo (x)
 Average case and worst case in revised algorithm are identical. 1 + log N comparisons (rounded down to the nearest integer) are used. Example: If N = 1,000,000, then 20 element comparisons are used. Sequential search would be 25,000 times more costly on average.
Big-Oh Rules (1)
 Mathematical expression relative rates of growth  DEFINITION:(Big-Oh) T(N) = O(F(N)) if there are
positive constants c and N0such that T(N) ≤ c F(N)
when N ≥ N0.
 DEFINITION:(Big-Omega) T(N) = Ω(F(N)) if there are
constants c and N0such that T(N) ≥ c F(N) when
N ≥ N0.
 DEFINITION:(Big-Theta) T(N) = Θ(F(N)) if and only if
T(N) = O(F(N)) and T(N) = Ω(F(N)).
 DEFINITION:(Little-Oh) T(N) = o(F(N)) if and only if T(N) = O(F(N)) and T(N) ≠ Θ (F(N)).
SDA/ALG/V2.0/53
Big-Oh Rules (2)
 Meanings of the various growth functions
 Jika ada lebih dari satu parameter, maka aturan tersebut berlaku untuk setiap parameter.
 4 nlog(m) + 50 n2+ 500 m+ 1853 Î O(nlog(m) + n2 + m)
 4 mlog(m) + 50 n2+ 500 m+ 853 Î O(mlog(m) + n2)
T(N) = O(F(N)) Growth of T(N) is ≤ growth of F(N) T(N) = Ω(F(N)) Growth of T(N) is ≥ growth of F(N) T(N) = Θ(F(N)) Growth of T(N) is = growth of F(N) T(N) = o(F(N)) Growth of T(N) is < growth of F(N)
Big-Oh Rules (3)
 Jika ada lebih dari satu parameter, maka aturan tersebut berlaku untuk setiap parameter.
 4 nlog(m) + 50 n2+ 500 m+ 1853 Î O(nlog(m) + n2+ m)
 4 mlog(m) + 50 n2+ 500 m+ 853 Î O(mlog(m) + n2)
SDA/ALG/V2.0/55
Summary
 more data means the program takes more time  Big-Oh tidak dapat digunakan untuk N yang kecil
Exercises
 Urutkan fungsi berikut berdasarkan laju pertumbuhan (growth rate)
 N, √N, N1.5, N2, N log N, N log log N, N log2N, N log
(N2), 2/N, 2N, 2N/2, 37, N3, N2log N
 A5+ B5+ C5+ D5+ E5= F5  0 < A ≤ B ≤ C ≤ D ≤ E ≤ F ≤ 75  hanya memiliki satu solusi. berapa?
SDA/ALG/V2.0/57