1027 Jurnal Teknik Informatika dan Sistem Informasi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal.1027-1040 E-ISSN 2503-2933
Crawling And Indexing Analysis System Based On Breadth First Search For Comparison Of Goods Value
Hendi Sama*1
1Universitas Internasional Batam; Jl. Gajah Mada, Baloi – Sei Ladi, Batam 29426 Phone : (0778) 743 7111
3Program Studi Sistem Informasi, Fakultas Ilmu Komputer e-mail: *1hendi@uib.ac.id
Abstrak
Penelitian ini bertujuan untuk mengatasi permasalahan yang dihadapi pengguna dalam melakukan pencarian harga barang yang tidak konsisten karena pengaruh dari berbagai sumber. Kegiatan penelitian dilakukan selama kurun waktu 6 bulan. Metode yang digunakan dalam pelaksanaan penelitian ini adalah observasi dan analisis; dan metode waterfall. Sistem yang dibangun dimulai dari awal melalui data yang diperoleh, dan perancangan sistem meliputi perancangan tampilan, diagram cara kerja sistem dan cara kerja user, penulisan source code dan implementasi sistem, metode pencarian yang digunakan dalam sistem adalah pencarian pertama yang luas, teknik yang banyak diadaptasi oleh sistem. Hasilnya adalah sistem yang dibangun sesuai dengan tujuan dan manfaat yang diharapkan oleh pengguna dan implementasi perbandingan harga barang dalam lingkungan web siap diimplementasikan dan telah dicoba oleh pengguna sendiri. Teknik algoritma pencarian informasi mempengaruhi informasi yang diperoleh, tidak semua metode menghasilkan informasi yang sama; dan hasil yang diperoleh dari luasnya teknik pencarian pertama mempengaruhi logika yang diberikan pada algoritma penguraian data, sehingga kualitas informasi juga terpengaruh.
Kata kunci—Pencarian Harga, Bread First Search, Barang
Abstract
This study aims to overcome the problems faced by users in conducting price searches for goods that are inconsistent due to the influence of various sources. Research activities were carried out over a period of 6 months. The method used in the implementation of this research is observation and analysis; and waterfall method. The system that was built started from the beginning through the data obtained, and the system design includes display design, diagrams
Jatisi ISSN 2407- 4322
Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1028
1. INTRODUCTIONResearch conducted by Inun et al. [1], states that in the increasingly rapid development of the technological era, the level of information need will grow to certain parties who need information. Research conducted by other researcher [2], also states that information is the result of processing data obtained from various data and formed into something that is easy to understand and become a knowledge for people who need to add to the facts that exist to support the achievement of goals. Utilization of technology covers various fields [3] [4] [5]
ranging from economics, business and education, processing systems and Information management; It is also very influential in improving effective and efficient performance of online processing [6].
Based on the above background, and to help users face the difficulty of choosing a price in a form of goods, therefore the authors make this research. The problem formulations are: 1) How to design a price comparison system architecture using the breadth first search method; and 2) How to implement price comparison in a web environment. The limitation of the problem taken is to only make price comparisons according to the information data provided from the website that provides the information, without intermediary for buying and selling goods; Any information on the description and price of goods received from the search results will not be changed; Sources of goods search websites are Tokopedia, Lazada and Bhineka. The aims of this research are: 1) Designing a price comparison system architecture using a web-based breadth first search method; 2) Implementing the price comparison of goods in a web environment; and the benefits of this research are expected to help users search for price comparisons of goods desired by users; and become information in making decisions based on the results of price comparisons from various references.
2. LITERATURE REVIEW AND HYPOTHESIS DEVELOPMENT
Amudha & Phil completed a study titled "Web Crawler For Mining Web Data" [7].
This study explains what a web crawler is, the terms that it first claimed, how its architecture functions from application, retrieval, and data processing, the different kinds of web crawlers that can be used to help the target more effectively retrieve data based on user needs, and the policies that web crawlers should take into account to improve crawling. On the other side, Pranav & Chauhan's study entitled "Efficient Focused Web Crawling Approach For Search Engine" [8], which describes the true reason for adopting web crawlers and search techniques that can be used to aid in the process of more effectively retrieving data from the target. as required.
The implementation of a web crawler, its operation from initial setup to data retrieval and processing, the different kinds of web crawlers that can be used to aid in the process of retrieving data from the target, and an explanation of the factors that must be taken into account when implementing a web crawler on the system that will be made are all covered in the study of web crawler conducted by a team of researchers [9]. The study "A Survey On Web Crawler Approaches" by other researches [10], which discusses web crawler projects that have been completed on other people's projects, calculates the information data obtained, and examines how the system functions, can also be seen as an additional source of information. applied for others to learn.
Additionally, article other researchers [11] provide an overview of crawling and its use, processing data from crawling results that are entered in the database to gain more information.
the most and in line with user requirements, development and analysis stages, research
1029 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E-ISSN 2503-2933
methodology, and comparisons of data search techniques. Another study by Adnyana and Bagus that supports this is titled "Rancang Bangun Sistem Informasi Geografis Persebaran Lokasi Obyek Pariwisata Berbasis Web Dan Mobile Android (Studi Kasus Di Dinas Pariwisata Kabupaten Gianyar)" [12]. The goal of this study is to create a traveler-friendly application that provides information about potential tourism destinations. the waterfall approach was used in the creation of this system. the Waterfall process's four main steps: analysis, design, execution, and testing. At PT INTI, this system is in use.Currently, many goods are traded online through the official website media or called e- commerce [13], e-commerce is the process of buying and selling in the form of goods &
services by the person concerned through the form of electronic media. The World Wide Web, in that place everyone has the opportunity to sell goods for profit [14] [15], for example:
Tokopedia, Lazada and Bhineka, third) these websites can sell the same goods but have different prices, the difference in price will make the buyers feel confused to find a fairly cheap price because the goods are not provided in the same place.
According to Almazaydeh et al. [16], System Development Life Cycle (SDLC) is one of the classic methods most often used in the world of system development, using SDLC can help develop a more structured system and clearer division of tasks, the SDLC stages are divided into 4 (four) stages as follows: 1) Planning; at the planning stage is the initial part in system development, at this stage we have to write, define, share, and estimate the needs needed in system development, examples of needs such as: human resources, tools and materials needed in workmanship, system architecture, and system financing; 2) Analysis; Analysis stage is the second stage in system development, at this stage we need to analyze and examine the system to be built, what problems are experienced, solutions to problems experienced; 3) Design; The design stage is the third stage in system development, in this stage is working on the system in accordance with the main points of system analysis and research discussed in the second stage, the design is divided into two sides, namely display design and system code design; 4) Implementation; The implementation stage is the last stage of the SDLC system development, in the implementation stage we will apply the system that has been developed by the developer to their work according to the objectives in the first stage of planning this system.
The line of thought used by the author to design and build this research is through identifying the problems that occur, analyzing the problem boundaries that will be designed in the system, collecting data about web crawling projects, describing the application design, after both parties agree, it will then proceed with the execution process of making the system. Next step, we will do testing the designed application, if it is appropriate then the application will be implemented;
if there are still discrepancies in the design of the application, it will be revised.
See the following figure regarding the explanation of the research flow:
Jatisi ISSN 2407- 4322
Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1030
Figure 1. Research Flow
The problem analysis is carried out by the author to learn more about the problems received by the user, so that the system can be implemented optimally, problem analysis is also carried out to find the right method to be applied to the system by comparing several methods first, so that the right method will be used on the system.
To get an assessment of a study required criteria or supporting data. The following table is the data used to assess the effectiveness of the work and information on the assessment criteria.
Table 1. Criteria Assessment Information
Factor Name Criteria Presentation
Core Factor Page Corresponding 1-5
Performance & Effeciency 1-5
Quality 1-5
Freshness 1-5
3. RESEARCH METHODOLOGY
3.1 Research Method
The method applied in this research is the Breadth First Search (BFS) method, a brief definition is a search method that finds critical pages to be used as a source of information to all other pages, the more information you get, the better the method, here is the calculation formula on the Breadth First Search method.
Precision Rate=Relevant Pages / Total Download Pages, while:
1. Precision Rate is the accuracy of data obtained from search results on various websites.
2. Relevant Page is from each page that is accessed has how many pages that have in common the desired keywords; and
3. Total Download Page is how many web pages have been visited to get information from the desired keywords.
1031 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E-ISSN 2503-2933
In designing the system as per userview, the steps for user is starting from the user side, the user will open a crawling webpage, by browsing the contents of the category section, the name of the item, the total results and the appearance of the process. The user will then perform a search based on the options provided until the user considers that the search results are in accordance with the user's wishes. When it is obtained, then the process ends here.While designing the system as per system view, The first step is to accept the POST for the category name, TOTAL RESULT and PROCESS APPEARANCE. The system will validate the data, whether the data is valid in accordance with the rules required by the system. If the data is still invalid, then the system will request that another POST be sent from the data category, TOTAL RESULTS and PROCESS APPEARANCE; in the event that the system has ensured that the data is valid, then the process will be continued by searching according to the requested data category. When the data that matches the search for the category has been generated, the system will check the suitability of the data. If the data does not match, the system will request that a search be carried out based on the new category; whereas if the data is appropriate, it will be continued with the PARSING LINK process, which will then store the results into an existing VARIABLE LIST.
The next process is to do a search based on the NAME of the GOODS, which of course will first make sure that the inputted data is in accordance with the requirements of the ITEM NAME search. If the data is appropriate, then PARSING will be carried out, by retrieving the data required by the USER.
After storing the information that has been successfully retrieved, a data check will be carried out according to the total desired results, including checking the limits of the access page. If not met, the system will return the process to the search process based on data categories. Up to this point, if it is fulfilled, then the process from a system point of view is considered complete.
3.2 User Interface 1. Homepage
This page contains the initial view when this webpage is accessed, there is a place to fill in data in the form of category name, name of the item being searched for, total search data, search techniques, and the system work process appears, see the following image for the initial page display.
Jatisi ISSN 2407- 4322
Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1032
Figure 2. Initial Page Display 2. Information Page
This page contains data results that are ready to be processed to be shown to users, there is a SORT PRICE module to filter the desired price results, and the Top 4 Price division category contains the cheapest price for each search record, and OTHER RESULT as a continuation of information among 4 (four) the highest price, see the following picture for the Page View information.
1033 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E-ISSN 2503-2933
Figure 3. Page View Information 3. Feedback Page
This page serves for users to provide input to the system maker what needs to be repaired or revised, the aim is to create a more optimal WEB CRAWLER, with several filling places that need to be filled in by users including, Name, Address, Education, Note, Rate Quality, Rate Fresh, Rate Performance, and Rate Page Corresponding, after completing filling the user is required to press the save button so that the feedback can be conveyed to the system creation, see the following image for the form of the feedback page.
Jatisi ISSN 2407- 4322
Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1034
Figure 4. Feedback Form
4. RESULTS AND DISCUSSIONS
4.1 System Design 1. Homepage
Homepage, This page is aimed at the user when the user first opens this system, the function of this page is to get information in the form of category names, keywords, search techniques and the desired total result from the user, we can see it in the following picture.
Figure 5. Homepage
1035 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E-ISSN 2503-2933
2. Information PageThis page is aimed at the user when the user has pressed the search button on the home page, this page contains search result information from all specified website sources, we can see it in the following information page image.
Figure 6. Information Page
3. Feedback page
This page is intended for users who want to provide feedback to the system maker, whether the system is as desired or whether it needs to be improved again, users need to fill in Name, Address, Education, Note, Rate Quality, Rate Fresh, Rate Perform, and Rate Page Corresponding to can be a feedback participant, this can be seen in the following image of the feedback page.
Jatisi ISSN 2407- 4322
Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1036
4.2 Discussion
System testing is carried out to test whether the system is running as expected, in this study will use the black box testing method, the following are the results of testing the system:
Table 2. Homepage Process
Name Data Input Expectation Output Result
Transactions on the search start page
The category column is not filled
The message “Please fill out this field” appears in that column
The message failed and cannot go to the next page
Accepted The name
column is not filled
The message “Please fill out this field” appears in that column
The message failed and cannot go to the next page
Accepted The search
type is not filled
The message “Please fill out this field” appears in that column
The message failed and cannot go to the next page
Accepted The Total
Result is not filled
The message “Please fill out this field” appears in that column
The message failed and cannot go to the next page
Accepted The Show
Progress column is not filled
If it is unchecked, the process will appear, if it is checked, the search process will be hidden
In accordance with
the purpose Accepted
The Total Result is filled out of range
The message "value must be less or than 99"
appears in that column
The message failed and cannot go to the next page
Accepted
Table 3. Sorting Price Page Process
Name Data Input Expectation Output Result
Transaksi sorting price
The min price column is not filled
The message "Please fill out this field" appears in that column
The message failed and cannot go to the next page
Accepted The max price
column is not filled
The message "Please fill out this field" appears in that column
The message failed and cannot go to the next page
Accepted The min price is
not filled in accordance with the requirements
Show the message "value must be less or than ..."
in that column
The message failed and cannot go to the next page
Accepted
The max price is not filled in accordance with the requirements
Show the message "value must be less or than ..."
in that column
The message failed and cannot go to the next page
Accepted
1037 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E-ISSN 2503-2933
Table 4. Feedback Page Nama
Proses Data Input Expectation Output Result
Users provide feedback
The Name
column is not filled
The message "Please fill out this field" appears in that column
Message failed and
can't save record Diterima The Education
column is not filled
The message "Please fill out this field" appears in that column
Message failed and
can't save record Diterima The Category
Search column is not filled
The message "Please fill out this field" appears in that column
Message failed and
can't save record Diterima The Keyword
Search column is not filled
The message "Please fill out this field" appears in that column
Message failed and
can't save record Diterima The Searching
Technique column is not filled
The message "Please fill out this field" appears in that column
Message failed and
can't save record Diterima
The Total
Searching column is not filled
The message "Please fill out this field" appears in that column
Message failed and
can't save record Diterima
The Rate
Quality filled
not in
accordance with the requirements
Show the message "value must be less or than ..." in that column
Message failed and
can't save record Diterima
The Rate Fresh column filled
not in
accordance with the requirements
Munculkan pesan “value must be less or than …”
pada kolom tersebut
Message failed and
can't save record Diterima
The Rate
Perform column filled not in accordance with
Show the message "value must be less or than ..." in that column
Message failed and
can't save record Diterima
Jatisi ISSN 2407- 4322
Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1038
To analyze the system's capabilities, the system will be implemented by asking users to provide feedback whether the goals and benefits provided by the system are in line with expectations, there are 6 users involved in the system experiment, the following are the results of the crawler data using 3 marketplace websites (Tokopedia, Bhinneka, Lazada):The author then uses an application built to crawl the desired information, using search criteria:
1. Category 2. Name
3. Search Technique: Breath First Search /Page Rank 4. Number of Outputs
5. Process Appears
Criteria Assessment Results is used with Page Corresponding, Performance &
Efficiency, Quality and Freshness; in order to count Precision Rate of each user input. Precision Rate will be counted by using the following formula: Precision Rate=Relevant Page / Total Download Page. Herewith the results:
Table 5. Precision Rate Result
Name Education Level
Category
Search Keyword Searching Technique
Precision Rate (Relevant Page/Total Page
Download)
Jimmy Under
Graduate Bag Woman Bag Page Rank 1
Ardiansyah Johan Ariwibowo
High School Car Honda Breadth First
Search 0.003
Kendri High School Computer Processor Breadth First
Search 0.01
Kendy Under
Graduate Computer VGA Page Rank 0.1
Ratna
Niagawati High School Computer RAM Breadth First
Search 0.002
Wilyanto High School Dress T-Shirt Page Rank 1
Figure 8. Search Mapping Results
1039 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E-ISSN 2503-2933
5. CONCLUSION
The breadth first search method pricing comparison system's architectural design is ready for implementation. The users themselves have tested the implementation of price comparison of goods in a web environment and it is ready to be implemented. Not all information search algorithm strategies create the same information, and the results of the breadth first search search strategy have an impact on the logic we supply to the data parsing algorithm, which has an impact on the quality of information as well. Because each website has a different developer and uses a different framework to produce a unique website framework, it is preferable for breadth first search logic to be more dynamic in future study to offer flexibility in searching diverse websites. Study the issues that arise during the search process before selecting a search strategy in light of the circumstances at hand, as each situation requires a distinct approach to problem solving.
Limitation and study forward
The speed of a web crawler depends on the user's internet speed, if a developer can eliminate elements of web pages that are not needed, it will save more time and can reach information sources that were not previously achieved.
Acknowledgment
The researcher would like to thank those who have played a major role in the completion of this research, especially to fellow lecturers of computer science at Batam International University for their support for the smooth implementation of this research.
REFERENCES
[1] N. Inun, Yulianto and E. R. Putra, "Development Boneythings Store E-Commerce Application For Selling Muslimah Clothes," TEPIAN, Vol. 3, No. 1, pp. 1-6, 2022.
[2] M. S. Shaxzodovna, "Prospects For Using Artificial Intelligence Technologies In Document Automation Systems," Academicia Globe: Inderscience Research, pp. 293-302, 2021.
[3] K. S. Shaydilloevna, "Application of Innovative Technologies In Learning Foreign Languages," Academia Globe: Inderscience Research, pp. 22-25, 2022.
Jatisi ISSN 2407- 4322
Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1040
[8] A. Pranav and S. Chauhan, "Efficient Focused Web Crawling Approach for Search Engine," International Journal of Computer Science and Mobile Computing, Vol. 4, No.5, pp. 545-551, 2015.
[9] P. R. Yunelfi, A. S. Popalia, F. Fahrani, Y. Purwanto and M. F. Ruriawan, "Dark Web Crawling Using Focused and Classified Algorithm," CEPAT Journal of Computer Engineering: Progress, Application and Technology, Vol. 1, No. 2, pp. 1-6, 2022.
[10] A. Bharambe, R. Dey, V. Bahadurge, B. Tanpure and R. Ramteke, "A Survey on Web Crawler Approaches," International Journal of Innovative Research in Computerand Communication Engineering, pp. 1902-1907, 2017.
[11] A. Josi, L. A. Abdillah and Suryayusra, "Penerapan Teknik Web Scraping pada Mesin Pencari Artikel Ilmiah," ARXIV, pp. 159-164, 2014.
[12] Y. Adnyana and I. Bagus, "Rancang Bangun Sistem Informasi Geografis Persebaran Lokasi Obyek Pariwisata Berbasis Web dan Mobile Android (Studi Kasus di Dinas Pariwisata Kabupaten Gianyar)," Jurnal Teknologi Informasi dan Komunikasi, Vol. 5, No. 1, 2014.
[13] A. R. M. Wahyu, H. Irawan, S. Permata and W. A. Anwar, "Imam Syafi'i's E-Commerce Concept's Relevance," Jurnal Ilmiah Ekonomi Islam, pp. 538-544, 2022.
[14] E. A. Amelia, "Business Model Analysis In Kartinipedia Application Using Business Model Canvas (BMC) Approach," International Journal of Economics,Businessand Accounting Research, pp. 400-412, 2022.
[15] N. P. Saputeri, E. T. Nurulia, Warsiyah and N. R. Wulandari, "MSME Marketing Strategy In The COVID-19 Pandemic Outbreak (Case Study In Andalas Steak Bandar Lampung),"
International Journal Of Economics, Business And Accounting Research, pp. 125-131, 2022.
[16] L. Almazaydeh, M. Alsafasfeh, R. Alsalameen and S. Alsharari, "Formalization of The Prediction and Ranking of Software Development Life Cycle Models," International Journal of Electrical and Computer Engineerin, pp. 534-540, 2022.