The Overview of Algorithms Implementation in File Search Applications or Digital Archives
Hanifah Permatasari
Faculty of Computer Science, University of Duta Bangsa, Indonesia [email protected]
Eko Purwanto
Faculty of Computer Science, University of Duta Bangsa, Indonesia [email protected]
Triyono
Faculty of Computer Science, University of Duta Bangsa, Indonesia [email protected]
ABSTRACT
As time goes by, the number of documents or files or archives in a company continues to increase.
These files are in the form of data collection, activity reports, accountability reports, work proposals, letters, decrees, regulations, and so on. At first, the company tried to convert paper-based files into digital files, then took advantage of several free platforms to store them. Furthermore, companies find it difficult to control digital files because of the non-centralized storage. Finally, companies are trying to create various information systems that allow them to centrally manage digital files. The large number of digital files in it makes the search process even longer, so it is necessary to optimize the search features. The purpose of this study is to review several algorithms that have been implemented to optimize file searches in previous studies. This research takes Indonesian scientific articles from Google Scholar from 2015 to 2022. The result of this research is the explanation of the use of algorithms in previous studies so that this research can become a reference and suggestion for further research. This study found the fact that every information system or application that functions to manage company files or archives requires a search algorithm to improve system performance.
Keywords : File search information system, file search system algorithm, archive
search application1. INTRODUCTION
Every company produces administrative documents. This document is in the form of data collection, activity reports, accountability reports, work proposals, letters, decrees, regulations, and so on. These documents are produced by one or several divisions and are usually made regularly and continuously. These documents are usually print-based or paper-based, which raises several physical, space, and time resistance problems. These problems must be addressed immediately because work results or work support files are something that is important to manage. This has made several agencies start paying attention to digital filing or archiving (Sholeh & Hartono, 2018).
In this digital age, printed documents are usually digitally copied to facilitate storage and distribution to authorized parties. The company has utilized information technology that is already connected online to store these digital documents. This makes digital documents more flexible for distribution to related divisions because some documents are not final documents. There are several documents that draft so they need to be coordinated to become the final document (Permatasari &
Nofikasari, 2021).
Digital file management consists of two stages, namely file storage and file retrieval (Muhidin et al., 2016). Digital file storage does not require a large office space, but requires several additional
tools such as a scanner to copy print files into digital files. In addition, digital files also require a lot of storage memory capacity. Nonetheless, the company has responded well to these consequences. They choose to store digital files on free online platforms, such as Google Drive, Dropbox, and similar platforms (Astuti & Lestiariningsih, 2021). The problem found from this free platform is that storage and distribution are not managed centrally. That is, digital files are stored by several staff accounts, distributed in several versions, and distributed on several communication media (email, chat applications, or social media). This is due to the large number of staff who make files, the large number of parties who receive files, and the importance of files to be processed immediately into new files. This makes digital files uncontrollable, making them difficult to recover.
Controlling digital files is very important to do, considering the growth of digital files from time to time is very fast. Several companies have realized that, so they have created various information systems that allow them to manage digital files. Initially, information systems were built to centralize digital file storage. When digital files are stored centrally, the next challenge is a large number of digital files in them, so it is necessary to optimize the search feature.
Implementing an algorithm is one effort that can be made to optimize the digital file search feature. The purpose of this study is to review several algorithms that have been implemented to optimize file search in previous studies so that this research can become a reference and suggestion for further research. It is important for companies to store digital files centrally, but to speed up the process of recovering digital files it is also necessary to pay attention to making information system applications.
2. RESEARCH METHODOLOGY THE RESEARCH
The method used is Literature Review. The type of data collected in this study is secondary data. Secondary data is supporting data sourced from various libraries and existing references. This study draws on library sources from Google Scholar with Indonesian-language articles published from 2015 to 2022. The objects of this study are articles with the keywords 1) File Search Information Systems, and 2) File Search Algorithms. These articles will be reviewed to find out 1) the characteristics of the managed files, 2) the reasons for choosing the implemented algorithm, and 3) the results obtained from implementing the algorithm. The research phase can be seen in Figure 1.
Figure 1. Research Stage
3. Results and Discussion 3.1. Data Collection
Table 1. Results of Data Collection No Name of Researcher, Year 1 (Priyanti et al., 2014) 2 (Fiarni et al., 2015) 3 (Ketut et al., 2015) 4 (Rossaria & Susilo, 2015) 5 (Danuri, 2016)
6 (Sa’diah, 2017) 7 (Khasanah, 2018) 8 (Sonita & Sari, 2018)
9 (Sholikhah & Kumalaeni, 2018) 10 (Hermawan & Rahayu, 2019) 11 (Cahyo Nugroho, 2019) 12 (Harahap, 2019) 13 (Khoerun et al., 2020) 14 (Pratama, 2020) 15 (Bahri, 2020)
16 (Simanjorang & Damanik, 2020) 17 (Fauzi et al., 2020)
18 (Djamen & Pratasik, 2020) 19 (Syahroni & Subairi, 2020) 20 (Roza et al., 2020)
21 (Ilham & Mirza, 2020)
22 (Permatasari & Nofikasari, 2021) 23 (Nasruddin et al., 2021)
24 (Yuniar & Amin, 2021) 25 (Adhimullah et al., 2021) 26 (Aryasa et al., 2022) 27 (Markuci & Prianto, 2022) 28 (Widiyastuti et al., 2022)
3.2. Data Selection
From the data in Table 1, it was found that not all file or archive management information systems use a certain algorithm in their search features. However, the search feature already exists in all information systems there. Usually, information system developers will create an index to identify files, for example by document name, or document number. The results of data selection for table 1 can be seen in table 2. Table 2 contains articles that use special algorithms to optimize search features in file or archive information systems.
Table 2. Results of Data Selection No Name of
Researcher, Year
File Search Feature
Special Method and Algorithm in search feature
1 (Priyanti et al., 2014) √ √
2 (Danuri, 2016) √ √
(Adhimullah et al., 2021)
3 (Rossaria & Susilo, 2015)
√ √
(Sa’diah, 2017) √ √
(Khasanah, 2018) √ √ (Ilham & Mirza, 2020) √ √
(Aryasa et al., 2022) √ √
4 (Sonita & Sari, 2018) √ √
(Yuniar & Amin, 2021)
√ √
5 (Harahap, 2019) √ √
(Bahri, 2020) √ √
6 (Nasruddin et al., 2021)
√ √
7 (Markuci & Prianto, 2022)
√ √
3.3. Data Description
The first research examines the problem of managing debtor loan files. These loan files are identified by the locker number. Researchers use the hashing to maximize the system. This method uses a hash table to store data. The hash table will change the letter values to ASCII values first before applying the modulus of integer values to the data. As a result, the system can identify the state of the locker (empty or used), so that it can assist the admin in viewing the status of the locker, and assigns an automatic locker number to the new file saver. This will make it easier to search for debtor archive data and update the status or process of borrowing (Priyanti et al., 2014).
The second study examines the problem of content-based search in text files. This study chose the Brute Force algorithm to optimize search. This algorithm was developed by creating local and global searches, so that all files can be read and checked to meet the search criteria. The content in question is the result of processing from standard word processing software, namely txt, doc, docx, xls, xlsx, ppt, and pptx. (Danuri, 2016).
The third research examines the problem of finding files that increase in high numbers from time to time, such as transaction files on export and import shipments (Khasanah, 2018), final assignment files in the repository (Sa'diah, 2017), all administrative files at the Tourism Office (Aryasa et al., 2022), all digital files on android (Rossaria & Susilo, 2015), and correspondence in the school environment (Ilham & Mirza, 2020). This study chose the Knuth Morris Pratt (KMP) algorithm. This algorithm was chosen because it excels in finding matches in large files. The KMP algorithm searches for text in order from left to right at the beginning of the text and then shifts the word order to the end of the text. As a result, the search feature on the information system produces search results that are faster than before the algorithm was implemented.
The fourth study examines the problem of finding official document files at the police (Fiarni et al., 2015), and correspondence files at universities (Sonita & Sari, 2018). The file in question is in digital form. The algorithm chosen in this study is Sequential Searching. This algorithm makes it possible to search data sequentially, by comparing each element one by one. The comparison starts from the first element until the element you are looking for is found, or until all elements have been examined.
Previously, the system had stored file data in table form, and already had an index in it. So, when looking for data, the algorithm will collect it in an array. When the data is in the array, it will appear. If not, then the algorithm will continue to compare the arrays, until the data you are looking for appears.
The fifth study examines the problem of searching for medical record files (Harahap, 2019), and registration data files for participating in championships (Bahri, 2020). This study chooses to apply the Turbo Booster Moore algorithm. This algorithm performs certain pattern matching of a sentence or paragraph from right to left. This algorithm makes it possible to perform turbo shifting. This transfer can only be done if the characters are the same as the pattern and the text is greater than the length of the same characters, which has been previously stored in the memory factor variable. As a result, the developed system has a shorter search process.
The sixth research examines the problem of finding files related to villagers (Nasruddin et al., 2021). This study uses the Binary Search algorithm. This algorithm minimizes the number of comparison operations between the data to be searched for and the data in the table. The trick is to divide the search space repeatedly until the file you are looking for is found. This algorithm was chosen
because it has a smaller or lighter computational load than other search algorithms. As a result, the information system can more quickly display the data being sought, whether found or not found. The application of an information system with this algorithm results in a faster search process than not using the algorithm at all.
The seventh study examines the different effects of applying sequential search and binary search on employee data in the official travel document application (Markuci & Prianto, 2022). The result is that the binary search algorithm has better search speed performance, because it is more stable and shorter in searching for data. So that the binary search algorithm will be more suitable to be applied to business travel letter applications or data search in applications with large or large amounts of data.
3.4 Conclusion of The data
This study found specific characteristics of the algorithms used by each of the studies above.
Among them are:
a. Method hashing identical to the use of a hash which makes it easy to perform insertion, search, edit and delete operations, and search quickly.
b. The Brute Force algorithm reads the contents of the file. This algorithm provides an opportunity for all files to be read and checked to meet the search criteria.
c. The Knuth Morris Pratt (KMP) algorithm searches for text based on the sequence from left to right at the beginning of the text and then shifts the word order to the end of the text.
d. Sequential Searching collects the data in the table into an array, then the array will be read until the desired file is found.
e. The Turbo Booyer Moore algorithm has a memory factor, so it's possible to do a turbo shift (no need to read all the data). This algorithm matches strings from right to left. When it's doing a comparison and something doesn't match, the pattern jumps further to avoid comparing the characters in the string it's expected to fail.
f. The Binary Search algorithm will sort the data first (front to back or vice versa). This algorithm divides its search space. If the data sought is smaller than the data in the middle, then the data search will continue to the left. If the data sought is greater than the data located in the middle, then the data search will continue to the right and then the search process will continue to repeat until the data in question or sought can be found.
4. CONCLUSION
Based on the research results above, it was found that the Knuth Morris Pratt Algorithm (KMP) is the most popular algorithm when compared to other algorithms. The main reason is its good ability to search large data, although this algorithm is not the only one capable of dealing with large data. Of course, each algorithm has advantages and disadvantages. To examine the advantages and disadvantages of each method, further research is needed.
Most of the previous studies applied file search algorithms to information systems that were built to deal with administrative paperwork (including assignment letters, travel documents, or official documents). The remainder is used to troubleshoot enterprise performance file management, and user data files. The research results always show that there is an increase in information system performance after the researcher implements the search algorithm. This is based on the principle of digital files, namely the ease of finding them again. This proves that every information system or application that functions to manage company files or archives requires a search algorithm to improve system performance.
REFERENCES
Adhimullah, Husaini, & Arhami, M. (2021). Rancang Bangun Sistem Informasi Manajemen Material Logistik pada PT . PLN (PERSERO) Area Lhokseumawe Berbasis Web Menggunakan Metode Brute Force. EProceeding of TIK, 1(1), 1–10. http://e-
jurnal.pnl.ac.id/eProTIK/article/view/2245/1913
Aryasa, K., Likliwatil, R. D., Yosep, & Prierendi, R. (2022). Implementasi Algoritma Knuth Morris Pratt
Dalam Pencarian Berkas Berbasis Web (Studi Kasus: Dinas Pariwisata Kota Makassar).
JURNAL SISTEM INFORMASI DAN TEKNOLOGI INFORMASI, 11(1), 1–12.
https://doi.org/https://doi.org/10.36774/jusiti.v11i1.906
Bahri, S. (2020). Penerapan algoritma turbo boyer moore untuk pencarian data pada sistem informasi pendaftaran kejuaraan bintang trisula cup [Universitas Islam Negeri Maulana Malik Ibrahim].
http://etheses.uin-malang.ac.id/20386/%0Ahttp://etheses.uin- malang.ac.id/20386/1/14650101.pdf
Cahyo Nugroho, A. (2019). Rancang Bangun Sistem Informasi Manajemen Surat Tugas Berbasis Web Menggunakan Waterfall Model. Jurnal Informatika: Jurnal Pengembangan IT, 4(2), 146–
151. https://doi.org/10.30591/jpit.v4i2.1382
Danuri, D. (2016). Pencarian File Teks Berbasis Content dengan Pencocokan String Menggunakan Algoritma Brute force. Scientific Journal of Informatics, 3(1), 68–75.
https://doi.org/10.15294/sji.v3i1.6515
Djamen, A. C., & Pratasik, S. (2020). Pembangunan Aplikasi Arsip Pegawai PT. PLN Persero Wilayah Suluttenggo. CogITo Smart Journal, 6(1), 60. https://doi.org/10.31154/cogito.v6i1.225.60-72 Fauzi, R., Dwanoko, Y. S., & Priana, A. J. (2020). RANCANG BANGUN SISTEM INFORMASI
SISTEM ARSIP DIGITAL DALAM PERMOHONAN KEHILANGAN DOKUMEN DI DINAS KEPENDUDUKAN DAN PENCATATAN SIPIL. RAINSTEK Jurnal Terapan Sains & Teknologi, 2(3), 253–259. https://ejournal.unikama.ac.id/index.php/jtst/article/view/4913/2868
Fiarni, C., Sipayung, E. M., & Martiana, Y. (2015). Perancangan Aplikasi Pembuatan Berkas Perkara Pidana Dan Pengelolaan Berkas Pada Sistem Informasi Direktorat Reserse Kriminal Umum.
Seminar Nasional Sistem Informasi Indonesia (SESINDO), 7, 429–434.
http://is.its.ac.id/pubs/oajis/index.php/home/detail/1592/PERANCANGAN-APLIKASI-
PEMBUATAN-BERKAS-PERKARA-PIDANA-DAN-PENGELOLAAN-BERKAS-PADA-SISTEM- INFORMASI-DIREKTORAT-RESERSE-KRIMINAL-UMUM
Harahap, F. H. (2019). Penerapan Algoritma Turbo Booyermoore dalam pencarian rekam medis pasien pada RS. Bunda Thamrin. Pelita Informatika, 7(3), 58–61.
Hermawan, A., & Rahayu, S. (2019). Sistem Informasi Manajemen dan Tracking Berkas (Studi Kasus : PTSP Kecamatan Kebon Jeruk). Jurnal Sistem Informasi Dan E-Bisnis, 1(2), 49–58.
https://jurnal.ikhafi.or.id/index.php/jusibi/article/view/73
Ilham, M., & Mirza, A. H. (2020). Pengarsipan Dokumen Pada Sma Plus Negeri 17 Palembang. Bina Darma Conference on Computer Science.
https://conference.binadarma.ac.id/index.php/BDCCS/article/view/1526/772
Ketut, I., Sudiartha, G., Ngurah, G., & Caturbawa, B. (2015). Perancangan Dan Implementasi Aplikasi Tata Arsip Pribadi Dosen Menggunakan Manajemen Folder Di Politeknik Negeri Bali. Jurnal Matrix, 5(2), 35–40.
Khasanah, N. (2018). PENERAPAN ALGORITMA KNUTH MORRIS PRATT PADA APLIKASI PENCARIAN BERKAS SHIPMENT BERBASIS WEB (Studi Kasus di PT YEC Semarang). E- Bisnis, 11, 14–22. https://journal.stekom.ac.id/index.php/E-Bisnis/article/view/93
Khoerun, Sarono, J., & Wijaya, F. W. (2020). SISTEM INFORMASI MANAJEMEN BERKAS TILANG PADA KEJAKSAAN NEGERI JAKARTA TIMUR. Jurnal Visualika STMIK Muhammadiyah Jakarta, 6(2), 104–118. https://jurnas.saintekmu.ac.id/index.php/visualika/article/view/83 Markuci, D., & Prianto, C. (2022). Analisis Perbandingan Penggunaan Algoritma Sequential Search
Dan Binary Search Pada Aplikasi Surat Perjalanan Dinas. JATI (Jurnal Mahasiswa Teknik Informatika), 6(1), 110–119. https://doi.org/10.36040/jati.v6i1.4569
Nasruddin, H., Mashuri, C., & Wiratsongko, R. (2021). PENERAPAN ALGORITMA BINARY SEARCHING UNTUK PENCARIAN BERKAS PADA SISTEM PENGARSIPAN (Study Kasus:
Pemerintahan Desa Kedungbetik). INOVATE, 5(2), 1–8.
http://ejournal.unhasy.ac.id/index.php/inovate/article/view/3121
Permatasari, H., & Nofikasari, I. (2021). Konsep Desain Sistem Informasi Manajemen Berkas Terpusat di Lembaga Amil Zakat Menggunakan Perspektif Nirlaba. JUSIFO (Jurnal Sistem Informasi), 7(2), 65–80. https://doi.org/https://doi.org/10.19109/jusifo.v7i2.9390
Pratama, G. (2020). Perancangan Sistem Informasi Manajemen Berkas Putusan Berbasis Web di Pengadilan Pajak Republik Indonesia. Senamika, 1(1), 326–343.
Priyanti, A., S, D. I., & R, W. E. Y. (2014). Rancang Bangun Sistem Informasi Pemberkasan Arsip Debitur Menggunakan Metode Hashing (Studi Kasus : PT . Bank Mandiri (Persero) Tbk Jember).
https://repository.unej.ac.id/handle/123456789/68474
Rossaria, M., & Susilo, B. (2015). Implementasi Algoritma Pencocokan String Knuth-Morris-Pratt
Dalam Aplikasi Pencarian Dokumen Digital Berbasis Android. Jurnal Rekursif, 3(2), 183–195.
Roza, Y., Hanum, G. K., & Amiasputra, A. (2020). Perancangan Sistem Iinformasi Pengarsipan Berkas Berbasis Web Pada PT. DUTA ABADI PRIMANTARA. CORE, 191–203.
https://core.ac.uk/download/pdf/336861067.pdf
Sa’diah, T. H. (2017). Implementasi Algoritma Knuth-Morris-Pratt Pada Fungsi Pencarian Judul Tugas Akhir Repository. Jurnal Komputasi, 14(1), 115–124.
Setiawan, I., & Effiyaldi. (2022). Sistem Informasi Manajemen Berkas Perkara Berbasis Web Pada Kejaksaan Negeri Merangin. MANAJEMEN SISTEM INFORMASI, 7(3).
https://ejournal.unama.ac.id/index.php/jurnalmsi/article/view/186/84
Sholikhah, F., & Kumalaeni, D. (2018). Sistem Informasi Penelusuran Perkara (Sipp): Penelusuran Arsip Berkas Perkara Di Pengadilan Agama Temanggung. Diplomatika: Jurnal Kearsipan Terapan, 1(1), 38. https://doi.org/10.22146/diplomatika.28300
Simanjorang, R. M., & Damanik, M. J. (2020). Sistem Informasi E-Direktori dalam Peningkatan Kualitas Layanan Perguruan Tinggi. MEANS (Media Informasi Analisa Dan Sistem), 5(2), 94–
98. http://103.76.21.184/index.php/Jurnal_Means/article/view/919/pdf2
Sonita, A., & Sari, M. (2018). Implementasi Algoritma Sequential Searching Untuk Pencarian Nomor Surat Pada Sistem Arsip Elektronik. Pseudocode, 5(1), 1–9.
https://doi.org/10.33369/pseudocode.5.1.1-9
Syahroni, A. W., & Subairi, I. (2020). Sistem Informasi Manajemen Arsip Pernikahan Pada Kantor Urusan Agama. Respati, 15(3), 92. https://doi.org/10.35842/jtir.v15i3.377
Widiyastuti, A., Sulasminarti, & Amiqoh, U. (2022). Sistem informasi manajemen berkas pada kelurahan pringsewu barat. JISN (Jurnal Informatika Software Dan Network), 03(02), 17–27.
http://jurnal.dccpringsewu.ac.id/index.php/ji/article/view/41/33
Yuniar, W. L., & Amin, F. (2021). SISTEM PENCARIAN NASKAH DINAS PADA POLRES KENDAL DENGAN ALGORITMA SEQUENTIAL SEARCH. Jurnal Manajemen Informatika Dan Sistem Informasi, 4(2), 92–100. http://e-journal.stmiklombok.ac.id/index.php/misi/article/view/359