View of Crawling And Indexing Analysis System Based On Breadth First Search For Comparison Of Goods Value


Academic year: 2023

Teks penuh


Jurnal Teknik Informatika dan Sistem Informasi Vol. 10, No. 1, Maret 2023, Hal.1027-1040

Crawling And Indexing Analysis System Based On Breadth First Search For Comparison Of Goods Value

Hendi Sama*1

1Universitas Internasional Batam; Jl. Gajah Mada, Baloi – Sei Ladi, Batam 29426 Phone : (0778) 743 7111

3Program Studi Sistem Informasi, Fakultas Ilmu Komputer e-mail: *1hendi@uib.ac.id


Penelitian ini bertujuan untuk mengatasi permasalahan yang dihadapi pengguna dalam melakukan pencarian harga barang yang tidak konsisten karena pengaruh dari berbagai sumber. Kegiatan penelitian dilakukan selama kurun waktu 6 bulan. Metode yang digunakan dalam pelaksanaan penelitian ini adalah observasi dan analisis; dan metode waterfall. Sistem yang dibangun dimulai dari awal melalui data yang diperoleh, dan perancangan sistem meliputi perancangan tampilan, diagram cara kerja sistem dan cara kerja user, penulisan source code dan implementasi sistem, metode pencarian yang digunakan dalam sistem adalah pencarian pertama yang luas, teknik yang banyak diadaptasi oleh sistem. Hasilnya adalah sistem yang dibangun sesuai dengan tujuan dan manfaat yang diharapkan oleh pengguna dan implementasi perbandingan harga barang dalam lingkungan web siap diimplementasikan dan telah dicoba oleh pengguna sendiri. Teknik algoritma pencarian informasi mempengaruhi informasi yang diperoleh, tidak semua metode menghasilkan informasi yang sama; dan hasil yang diperoleh dari luasnya teknik pencarian pertama mempengaruhi logika yang diberikan pada algoritma penguraian data, sehingga kualitas informasi juga terpengaruh.

Kata kunci—Pencarian Harga, Bread First Search, Barang


This study aims to overcome the problems faced by users in conducting price searches for goods that are inconsistent due to the influence of various sources. Research activities were carried out over a period of 6 months. The method used in the implementation of this research is observation and analysis; and waterfall method. The system that was built started from the beginning through the data obtained, and the system design includes display design, diagrams


Jatisi ISSN 2407- 4322

Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1028


Research conducted by Inun et al. [1], states that in the increasingly rapid development of the technological era, the level of information need will grow to certain parties who need information. Research conducted by other researcher [2], also states that information is the result of processing data obtained from various data and formed into something that is easy to understand and become a knowledge for people who need to add to the facts that exist to support the achievement of goals. Utilization of technology covers various fields [3] [4] [5]

ranging from economics, business and education, processing systems and Information management; It is also very influential in improving effective and efficient performance of online processing [6].

Based on the above background, and to help users face the difficulty of choosing a price in a form of goods, therefore the authors make this research. The problem formulations are: 1) How to design a price comparison system architecture using the breadth first search method; and 2) How to implement price comparison in a web environment. The limitation of the problem taken is to only make price comparisons according to the information data provided from the website that provides the information, without intermediary for buying and selling goods; Any information on the description and price of goods received from the search results will not be changed; Sources of goods search websites are Tokopedia, Lazada and Bhineka. The aims of this research are: 1) Designing a price comparison system architecture using a web-based breadth first search method; 2) Implementing the price comparison of goods in a web environment; and the benefits of this research are expected to help users search for price comparisons of goods desired by users; and become information in making decisions based on the results of price comparisons from various references.


Amudha & Phil completed a study titled "Web Crawler For Mining Web Data" [7].

This study explains what a web crawler is, the terms that it first claimed, how its architecture functions from application, retrieval, and data processing, the different kinds of web crawlers that can be used to help the target more effectively retrieve data based on user needs, and the policies that web crawlers should take into account to improve crawling. On the other side, Pranav & Chauhan's study entitled "Efficient Focused Web Crawling Approach For Search Engine" [8], which describes the true reason for adopting web crawlers and search techniques that can be used to aid in the process of more effectively retrieving data from the target. as required.

The implementation of a web crawler, its operation from initial setup to data retrieval and processing, the different kinds of web crawlers that can be used to aid in the process of retrieving data from the target, and an explanation of the factors that must be taken into account when implementing a web crawler on the system that will be made are all covered in the study of web crawler conducted by a team of researchers [9]. The study "A Survey On Web Crawler Approaches" by other researches [10], which discusses web crawler projects that have been completed on other people's projects, calculates the information data obtained, and examines how the system functions, can also be seen as an additional source of information. applied for others to learn.

Additionally, article other researchers [11] provide an overview of crawling and its use, processing data from crawling results that are entered in the database to gain more information.

the most and in line with user requirements, development and analysis stages, research


Vol. 10, No. 1, Maret 2023, Hal. 1027-1040

methodology, and comparisons of data search techniques. Another study by Adnyana and Bagus that supports this is titled "Rancang Bangun Sistem Informasi Geografis Persebaran Lokasi Obyek Pariwisata Berbasis Web Dan Mobile Android (Studi Kasus Di Dinas Pariwisata Kabupaten Gianyar)" [12]. The goal of this study is to create a traveler-friendly application that provides information about potential tourism destinations. the waterfall approach was used in the creation of this system. the Waterfall process's four main steps: analysis, design, execution, and testing. At PT INTI, this system is in use.

Currently, many goods are traded online through the official website media or called e- commerce [13], e-commerce is the process of buying and selling in the form of goods &

services by the person concerned through the form of electronic media. The World Wide Web, in that place everyone has the opportunity to sell goods for profit [14] [15], for example:

Tokopedia, Lazada and Bhineka, third) these websites can sell the same goods but have different prices, the difference in price will make the buyers feel confused to find a fairly cheap price because the goods are not provided in the same place.

According to Almazaydeh et al. [16], System Development Life Cycle (SDLC) is one of the classic methods most often used in the world of system development, using SDLC can help develop a more structured system and clearer division of tasks, the SDLC stages are divided into 4 (four) stages as follows: 1) Planning; at the planning stage is the initial part in system development, at this stage we have to write, define, share, and estimate the needs needed in system development, examples of needs such as: human resources, tools and materials needed in workmanship, system architecture, and system financing; 2) Analysis; Analysis stage is the second stage in system development, at this stage we need to analyze and examine the system to be built, what problems are experienced, solutions to problems experienced; 3) Design; The design stage is the third stage in system development, in this stage is working on the system in accordance with the main points of system analysis and research discussed in the second stage, the design is divided into two sides, namely display design and system code design; 4) Implementation; The implementation stage is the last stage of the SDLC system development, in the implementation stage we will apply the system that has been developed by the developer to their work according to the objectives in the first stage of planning this system.

The line of thought used by the author to design and build this research is through identifying the problems that occur, analyzing the problem boundaries that will be designed in the system, collecting data about web crawling projects, describing the application design, after both parties agree, it will then proceed with the execution process of making the system. Next step, we will do testing the designed application, if it is appropriate then the application will be implemented;

if there are still discrepancies in the design of the application, it will be revised.

See the following figure regarding the explanation of the research flow:


Jatisi ISSN 2407- 4322

Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1030

Figure 1. Research Flow

The problem analysis is carried out by the author to learn more about the problems received by the user, so that the system can be implemented optimally, problem analysis is also carried out to find the right method to be applied to the system by comparing several methods first, so that the right method will be used on the system.

To get an assessment of a study required criteria or supporting data. The following table is the data used to assess the effectiveness of the work and information on the assessment criteria.

Table 1. Criteria Assessment Information

Factor Name Criteria Presentation

Core Factor Page Corresponding 1-5

Performance & Effeciency 1-5

Quality 1-5

Freshness 1-5


3.1 Research Method

The method applied in this research is the Breadth First Search (BFS) method, a brief definition is a search method that finds critical pages to be used as a source of information to all other pages, the more information you get, the better the method, here is the calculation formula on the Breadth First Search method.

Precision Rate=Relevant Pages / Total Download Pages, while:

1. Precision Rate is the accuracy of data obtained from search results on various websites.

2. Relevant Page is from each page that is accessed has how many pages that have in common the desired keywords; and

3. Total Download Page is how many web pages have been visited to get information from the desired keywords.


Vol. 10, No. 1, Maret 2023, Hal. 1027-1040

In designing the system as per userview, the steps for user is starting from the user side, the user will open a crawling webpage, by browsing the contents of the category section, the name of the item, the total results and the appearance of the process. The user will then perform a search based on the options provided until the user considers that the search results are in accordance with the user's wishes. When it is obtained, then the process ends here.

While designing the system as per system view, The first step is to accept the POST for the category name, TOTAL RESULT and PROCESS APPEARANCE. The system will validate the data, whether the data is valid in accordance with the rules required by the system. If the data is still invalid, then the system will request that another POST be sent from the data category, TOTAL RESULTS and PROCESS APPEARANCE; in the event that the system has ensured that the data is valid, then the process will be continued by searching according to the requested data category. When the data that matches the search for the category has been generated, the system will check the suitability of the data. If the data does not match, the system will request that a search be carried out based on the new category; whereas if the data is appropriate, it will be continued with the PARSING LINK process, which will then store the results into an existing VARIABLE LIST.

The next process is to do a search based on the NAME of the GOODS, which of course will first make sure that the inputted data is in accordance with the requirements of the ITEM NAME search. If the data is appropriate, then PARSING will be carried out, by retrieving the data required by the USER.

After storing the information that has been successfully retrieved, a data check will be carried out according to the total desired results, including checking the limits of the access page. If not met, the system will return the process to the search process based on data categories. Up to this point, if it is fulfilled, then the process from a system point of view is considered complete.

3.2 User Interface 1. Homepage

This page contains the initial view when this webpage is accessed, there is a place to fill in data in the form of category name, name of the item being searched for, total search data, search techniques, and the system work process appears, see the following image for the initial page display.


Jatisi ISSN 2407- 4322

Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1032

Figure 2. Initial Page Display 2. Information Page

This page contains data results that are ready to be processed to be shown to users, there is a SORT PRICE module to filter the desired price results, and the Top 4 Price division category contains the cheapest price for each search record, and OTHER RESULT as a continuation of information among 4 (four) the highest price, see the following picture for the Page View information.


Vol. 10, No. 1, Maret 2023, Hal. 1027-1040

Figure 3. Page View Information 3. Feedback Page

This page serves for users to provide input to the system maker what needs to be repaired or revised, the aim is to create a more optimal WEB CRAWLER, with several filling places that need to be filled in by users including, Name, Address, Education, Note, Rate Quality, Rate Fresh, Rate Performance, and Rate Page Corresponding, after completing filling the user is required to press the save button so that the feedback can be conveyed to the system creation, see the following image for the form of the feedback page.


Jatisi ISSN 2407- 4322

Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1034

Figure 4. Feedback Form


4.1 System Design 1. Homepage

Homepage, This page is aimed at the user when the user first opens this system, the function of this page is to get information in the form of category names, keywords, search techniques and the desired total result from the user, we can see it in the following picture.

Figure 5. Homepage


Vol. 10, No. 1, Maret 2023, Hal. 1027-1040

2. Information Page

This page is aimed at the user when the user has pressed the search button on the home page, this page contains search result information from all specified website sources, we can see it in the following information page image.

Figure 6. Information Page

3. Feedback page

This page is intended for users who want to provide feedback to the system maker, whether the system is as desired or whether it needs to be improved again, users need to fill in Name, Address, Education, Note, Rate Quality, Rate Fresh, Rate Perform, and Rate Page Corresponding to can be a feedback participant, this can be seen in the following image of the feedback page.


Jatisi ISSN 2407- 4322

Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1036

4.2 Discussion

System testing is carried out to test whether the system is running as expected, in this study will use the black box testing method, the following are the results of testing the system:

Table 2. Homepage Process

Name Data Input Expectation Output Result

Transactions on the search start page

The category column is not filled

The message “Please fill out this field” appears in that column

The message failed and cannot go to the next page

Accepted The name

column is not filled

The message “Please fill out this field” appears in that column

The message failed and cannot go to the next page

Accepted The search

type is not filled

The message “Please fill out this field” appears in that column

The message failed and cannot go to the next page

Accepted The Total

Result is not filled

The message “Please fill out this field” appears in that column

The message failed and cannot go to the next page

Accepted The Show

Progress column is not filled

If it is unchecked, the process will appear, if it is checked, the search process will be hidden

In accordance with

the purpose Accepted

The Total Result is filled out of range

The message "value must be less or than 99"

appears in that column

The message failed and cannot go to the next page


Table 3. Sorting Price Page Process

Name Data Input Expectation Output Result

Transaksi sorting price

The min price column is not filled

The message "Please fill out this field" appears in that column

The message failed and cannot go to the next page

Accepted The max price

column is not filled

The message "Please fill out this field" appears in that column

The message failed and cannot go to the next page

Accepted The min price is

not filled in accordance with the requirements

Show the message "value must be less or than ..."

in that column

The message failed and cannot go to the next page


The max price is not filled in accordance with the requirements

Show the message "value must be less or than ..."

in that column

The message failed and cannot go to the next page



Vol. 10, No. 1, Maret 2023, Hal. 1027-1040

Table 4. Feedback Page Nama

Proses Data Input Expectation Output Result

Users provide feedback

The Name

column is not filled

The message "Please fill out this field" appears in that column

Message failed and

can't save record Diterima The Education

column is not filled

The message "Please fill out this field" appears in that column

Message failed and

can't save record Diterima The Category

Search column is not filled

The message "Please fill out this field" appears in that column

Message failed and

can't save record Diterima The Keyword

Search column is not filled

The message "Please fill out this field" appears in that column

Message failed and

can't save record Diterima The Searching

Technique column is not filled

The message "Please fill out this field" appears in that column

Message failed and

can't save record Diterima

The Total

Searching column is not filled

The message "Please fill out this field" appears in that column

Message failed and

can't save record Diterima

The Rate

Quality filled

not in

accordance with the requirements

Show the message "value must be less or than ..." in that column

Message failed and

can't save record Diterima

The Rate Fresh column filled

not in

accordance with the requirements

Munculkan pesan “value must be less or than …”

pada kolom tersebut

Message failed and

can't save record Diterima

The Rate

Perform column filled not in accordance with

Show the message "value must be less or than ..." in that column

Message failed and

can't save record Diterima


Jatisi ISSN 2407- 4322

Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1038

To analyze the system's capabilities, the system will be implemented by asking users to provide feedback whether the goals and benefits provided by the system are in line with expectations, there are 6 users involved in the system experiment, the following are the results of the crawler data using 3 marketplace websites (Tokopedia, Bhinneka, Lazada):

The author then uses an application built to crawl the desired information, using search criteria:

1. Category 2. Name

3. Search Technique: Breath First Search /Page Rank 4. Number of Outputs

5. Process Appears

Criteria Assessment Results is used with Page Corresponding, Performance &

Efficiency, Quality and Freshness; in order to count Precision Rate of each user input. Precision Rate will be counted by using the following formula: Precision Rate=Relevant Page / Total Download Page. Herewith the results:

Table 5. Precision Rate Result

Name Education Level


Search Keyword Searching Technique

Precision Rate (Relevant Page/Total Page


Jimmy Under

Graduate Bag Woman Bag Page Rank 1

Ardiansyah Johan Ariwibowo

High School Car Honda Breadth First

Search 0.003

Kendri High School Computer Processor Breadth First

Search 0.01

Kendy Under

Graduate Computer VGA Page Rank 0.1


Niagawati High School Computer RAM Breadth First

Search 0.002

Wilyanto High School Dress T-Shirt Page Rank 1

Figure 8. Search Mapping Results


Vol. 10, No. 1, Maret 2023, Hal. 1027-1040


The breadth first search method pricing comparison system's architectural design is ready for implementation. The users themselves have tested the implementation of price comparison of goods in a web environment and it is ready to be implemented. Not all information search algorithm strategies create the same information, and the results of the breadth first search search strategy have an impact on the logic we supply to the data parsing algorithm, which has an impact on the quality of information as well. Because each website has a different developer and uses a different framework to produce a unique website framework, it is preferable for breadth first search logic to be more dynamic in future study to offer flexibility in searching diverse websites. Study the issues that arise during the search process before selecting a search strategy in light of the circumstances at hand, as each situation requires a distinct approach to problem solving.

Limitation and study forward

The speed of a web crawler depends on the user's internet speed, if a developer can eliminate elements of web pages that are not needed, it will save more time and can reach information sources that were not previously achieved.


The researcher would like to thank those who have played a major role in the completion of this research, especially to fellow lecturers of computer science at Batam International University for their support for the smooth implementation of this research.


[1] N. Inun, Yulianto and E. R. Putra, "Development Boneythings Store E-Commerce Application For Selling Muslimah Clothes," TEPIAN, Vol. 3, No. 1, pp. 1-6, 2022.

[2] M. S. Shaxzodovna, "Prospects For Using Artificial Intelligence Technologies In Document Automation Systems," Academicia Globe: Inderscience Research, pp. 293-302, 2021.

[3] K. S. Shaydilloevna, "Application of Innovative Technologies In Learning Foreign Languages," Academia Globe: Inderscience Research, pp. 22-25, 2022.


Jatisi ISSN 2407- 4322

Vol. 10, No. 1, Maret 2023, Hal. 1027-1040 E- ISSN 2503 -2933 1040

[8] A. Pranav and S. Chauhan, "Efficient Focused Web Crawling Approach for Search Engine," International Journal of Computer Science and Mobile Computing, Vol. 4, No.

5, pp. 545-551, 2015.

[9] P. R. Yunelfi, A. S. Popalia, F. Fahrani, Y. Purwanto and M. F. Ruriawan, "Dark Web Crawling Using Focused and Classified Algorithm," CEPAT Journal of Computer Engineering: Progress, Application and Technology, Vol. 1, No. 2, pp. 1-6, 2022.

[10] A. Bharambe, R. Dey, V. Bahadurge, B. Tanpure and R. Ramteke, "A Survey on Web Crawler Approaches," International Journal of Innovative Research in Computerand Communication Engineering, pp. 1902-1907, 2017.

[11] A. Josi, L. A. Abdillah and Suryayusra, "Penerapan Teknik Web Scraping pada Mesin Pencari Artikel Ilmiah," ARXIV, pp. 159-164, 2014.

[12] Y. Adnyana and I. Bagus, "Rancang Bangun Sistem Informasi Geografis Persebaran Lokasi Obyek Pariwisata Berbasis Web dan Mobile Android (Studi Kasus di Dinas Pariwisata Kabupaten Gianyar)," Jurnal Teknologi Informasi dan Komunikasi, Vol. 5, No. 1, 2014.

[13] A. R. M. Wahyu, H. Irawan, S. Permata and W. A. Anwar, "Imam Syafi'i's E-Commerce Concept's Relevance," Jurnal Ilmiah Ekonomi Islam, pp. 538-544, 2022.

[14] E. A. Amelia, "Business Model Analysis In Kartinipedia Application Using Business Model Canvas (BMC) Approach," International Journal of Economics,Businessand Accounting Research, pp. 400-412, 2022.

[15] N. P. Saputeri, E. T. Nurulia, Warsiyah and N. R. Wulandari, "MSME Marketing Strategy In The COVID-19 Pandemic Outbreak (Case Study In Andalas Steak Bandar Lampung),"

International Journal Of Economics, Business And Accounting Research, pp. 125-131, 2022.

[16] L. Almazaydeh, M. Alsafasfeh, R. Alsalameen and S. Alsharari, "Formalization of The Prediction and Ranking of Software Development Life Cycle Models," International Journal of Electrical and Computer Engineerin, pp. 534-540, 2022.


