UNIVERSITI TEKNOLOGI MARA
PERFORMANCE EVALUATION OF INVERTED FILES, B-TREE AND B+ TREE INDEXING
ALGORITHM ON MALAY TEXT
SUHANAH BINTIROSNAN
Dissertation submitted in fulfillment of the requirements for the degree of
Master of Computer Science
Faculty of Computer and Mathematical Science
January 2016
AUTHORS DECLARATION
I declare that the work in this dissertation was carried out in accordance with the regulations of Universiti Teknologi MARA. It is original and is the results of my own work, unless otherwise indicated or acknowledged as referenced work. This topic has not been submitted to any other academic institution or non-academic institution for any degree or qualification.
In the event that my dissertation be found to violate the conditions mentioned above, I voluntarily waive the right of conferment of my degree and agree be subjected to the disciplinary rules and regulations of Universiti Teknologi MARA.
Name of Student Student I.D. No.
Programme Faculty
Dissertation Tittle
Suhanah binti Rosnan 2014204182
Master of Computer Science
Faculty of Computer and Mathematical Sciences Performance Evaluation of Inverted Files, B-tree and B+tree Indexing Algorithm on Malay Text
ABSTRACT
These worlds face a challenge of big data. Indexing is a part of Information Retrieval that helps to overcome the results yield on certain information. Like nowadays, people tend to have everything is seconds and faster. Thus, a study on indexing techniques that tend to have a better performance in retrieving documents in terms of time had been done. The indexing techniques used in this project by comparing their performance are Inverted Files Indexing, B-Tree Indexing and B+ Tree Indexing. This research uses 500 Malay text documents as the test collection. The indexing technique chosen has a slight difference in its data structure. Thus the ranking process uses the same method. The performance of the indexing technique was measured and compared in its retrieval time taken. The behaviors of each indexing technique are different and the searching methods are different too.
There are five experiments done in this paper that helps to determine which of the techniques are said have a good performance. The experiments done are on the time taken to index the terms or keywords to its data structure which are either linked list as in Inverted Files or B-Tree as implemented in B-Tree and B+ Tree indexing technique. In addition, experiments on the time taken to retrieve the documents were done by having a variable as the condition. All the results and discussion are explained and discuss deeper in this paper.
n
ACKNOWLEDGEMENTS
Bismillahirahmanirrahim, In the name of Allah, The most Gracious and the most Merciful
Firstly, I would like to express my gratitude to Allah with His graciousness for giving me a time to complete this project paper on time. The completion of this report is through the relentless cooperation of several persons besides me in giving guidance.
My utmost appreciation goes to my project supervisor Associate Professor Dr Nurazzah Abd. Rahman. She has given me support in completing this project. She also gives ideas that have opened my minds on the important components which is crucial in this project.
Without her endless advice and support, I would have not completed this project.
Secondly, a big thanks also goes to Prof Dr. Zainab Abu Bakar and former senior lecturer from Faculty of Pharmacy, Dr Wan Iryani binti Wan Ismail for their helps in making me have an enough data to be tested.
My thanks also extend to my husband, Ahmad Shahir bin Jantan who are very helpful in giving me support and guide me in completing the program.
TABLE OF CONTENTS
AUTHOR'S DECLARATION i
ABSTRACT ii ACKNOWLEDGEMENTS iii
TABLE OF CONTENTS iv LIST OF TABLES vii LIST OF FIGURES viii
CHAPTER l:INTRODUCTION 1
1.1 Introduction 1 1.2 Problem Background 2
1.3 Problem Statement 3 1.4 Research Objective 4 1.5 Project Scope 4 1.6 Research Significance 5
1.7 Summary 5
CHAPTER 2 : LITERATURE REVIEW 6
2.1 Introduction 6 2.2 Indexing By Term 6
2.3 Indexing Techniques 7 2.3.1 Overview Of Inverted Files Indexing 7
2.3.2 Overview Of B-Tree Indexing Technique 9 2.3.3 Overview Of B+Tree Indexing Technique 11 2.3.4 A Summary Of Indexing Technique 12
2.4 Summary 13 iv