I certify that this report entitled "MALAYSIA'S PRIVATE UNIVERSITY APPLICATION PLATFORM" is my own work, except as noted in the references. Lai Siew Cheng gave me this great opportunity to participate in a web application development project using web scraping technique and guided me firmly throughout the duration of this project. The platform is a web-based application that uses web scraping technology and the concept of data analytics to get the required data from the websites of the universities in real time.
INTRODUCTION
- Background Information
- Problem Statement and Motivation
- Project Objectives
- Project Scope
- Impact, Significance and Contribution
- Report Organization
The online application will be the one-stop information center to access information about the university's programs. The data analysis algorithm will be activated based on the user's interaction with the web application. The web application will be treated as a platform to display the final result of the data analysis on the scraped data.
LITERATURE REVIEW
Review on Similar Platforms
- Common App
- University Admissions (Universityadmissions.se)
- UCAS (Universities and Colleges Admissions Service)
- Comparison of Similar Platforms with the Proposed System
Moreover, it would be better if the platform provides the details of the course content. In the admission process, it will be better if the platform enables the applicants to select their desired courses so that the applicants can know if they are eligible for the chosen course of the educational institution. Most of the educational institutions in the UK require individuals to apply for a full-time undergraduate course with UCAS [6]. The figure below is the main page of UCAS.
Method and Technology Used
- Web Scraping Tool – Scrapy
- Web Scraping Tool – Beautiful Soup
- Web Scraping Tool – Puppeteer
- Comparison among Web Scraping Tools
- Rule-based AI
- Learning-based System
- Application Programming Interface (API)
- Programming Language – Python
- Web Framework (Backend) – Flask
- Database – MySQL
- Frontend– AngularJS
Python is versatile and it is a general purpose language that can be used for a wide variety of purposes of programs such as from web application development to machine learning. Furthermore, it is an open source programming language and contains a large number of modules and libraries. It is an open source relational database management system (RDBMS) based on structured query language.
Proposed Solution
Next, Beautiful Soup, a web scraping tool, will be used to extract data on a web page in real time. Based on the comparison in chapter 2.2.4, Beautiful Soup is suitable for small projects and web scraping beginners. The vital part of web scraping is having a link to tell the tool where to scrape the data.
PROPOSED SYSTEM METHOD/APPROACH
Design Specification
- Methodology
- Software Architectue Pattern
- Working Principle of Data Analysis on Scraped Data
- Working Principle of Rule-based AI
- Working Principle of API
- User Requirements
The rule-based model is the rules encoded in the form of the if-then-else statement. The rule-based AI will be applied to determine the field of study suggested to the user. As shown in the figure above, the responses to the questionnaire are the input of the rule-based AI system.
System Design Diagram
- System Architecture Diagram
- Use Case Diagram
- Use Case Description
- Activity Diagram
1 The system updates the first history list by adding basic program information to the list. 1 The system updates the first history list by modifying the search time of the program in the list. Prerequisite A user has viewed the program information and the program does not exist in the favorites list.
4 The system updates the favorites list by adding the program's basic information to the list. Trigger A user has selected an application to view its detailed information and the application is in the favorites list. The system will check on the displayed history list whether the program has been seen before.
If so, the system will update the first history list by modifying the program's search time in the list. The system will then retrieve the program information and add the program to the user's favorites list. Then the system will get the program information and follow by searching the list of favorites to get the program.
The system will collect information about the program and receive an error message that caused the error.
Timeline
SYSTEM DESIGN
- Project Flow Diagram
- Web Scraping and Data Analysis Flow
- Regular Checking Flow
- API Flow
- Web Application Flow
- Application System Flow
- Home Page - Get Location API Workflow
- Search Programme Flow
- View Programme Flow
- Recommendation System – Rule Based AI System
The result will be shown on a web application, so web application development is the next step. If it is the Monday of the week, tasks from the scheduler will be executed. The API function will be activated based on the API address where the route is defined in the link.
The functions in the API will be activated and the program will execute the functions. If an error exists within the function, the error will be logged in the database and the function's response will be set as an error message with the error status code. In addition, the detailed flow of searching and viewing the program will be described in the following sections.
Based on the user's answer, the system will determine a suitable field of study and the result will be displayed to the user. Therefore, the functions defined in the search engine API will be called, and the API will return the result obtained from the functions to the front end. When users choose to view detailed information about a specific program, the system will call the API to get information about the program with the requested parameters as the request body, whose API parameter will be automatically processed by the system.
If no error exists, program information will be displayed to users in the default template.
SYSTEM IMPLEMENTATION
Hardware Setup
Software Setup
In this project, the Python programming language is chosen to develop the backend of the web application, while the Beautiful Soup library is chosen to perform web scraping. Beautiful Soup is a Python library to perform real-time data retrieval from HTML and XML files. It allows to retrieve the latest information from the website and display the information to the users without storing a large volume of data in the database.
This web scraping tool was chosen because it is beginner friendly and the size of the proposed project is small, so there is no need for an extensive library to support it. It allows users to parse arguments to their resources, format output, and manage the routing setting with a clean interface [32]. Requests library allows users to make HTTP requests in a simpler way and is more human-friendly in Python.
Selenium is an open source automated testing framework used to validate web applications across different browsers and platforms. It is a heavy library, so it is only used to scrape the website that needs to accept cookies and enable JavaScript when request library cannot successfully access the website due to lack of suitable header while rendering [37 ]. It is an easy-to-use API to schedule jobs, including periodic jobs without any extra processes needed.
Flask-CORS pip install flask-cors Flask extension for CORS Flask-RESTful pip install flask-restful Flask extension for RESTful.
Setting and Configuration
- Flask Configuration
- API Configurations
- Database Setup
The figure above displays the SQL queries to a database and the entities required by system implementation.
Implementation of System
System Operation
- Web Scraping and Data Analysis Operation
- RESTful API operation
- Database Operation
- Recheck Operation
- Programme Searching and Viewing Interface Operation
- Recommendation Operation
- Enquiry Operation
- Favourite List Operation
- Viewed History List Operation
The code snippet shown in the figure above is the API coding and a successful execution of the /getProg route for the RESTful API with the HTTP GET request type. When a user accesses the link, the API returns the response based on the request parameters specified in the link. The data after the "?" in the link, the parameter is sent to the API along with the request.
As an example, the API call link value shown in Figure 5.7 is unreachable where the correct link should be "https://study.utar.edu.my/cpmputer-engineering.php". The control report or result is recorded in the database as shown in the figure below. In addition, it is limited to display only 10 results on one page, while other results will be displayed on other pages which can be navigated by clicking the page number as shown in the image below.
The program information will be displayed on the program page of the view as in the figure above. As shown in the figure above, there is a list of questions to answer in order to use the recommendation system. The recommendation results will be displayed in a Bootstrap modal as shown in the figure below.
After sending the inquiry, the system will receive an e-mail about the inquiry as shown in the figure below.
SYSTEM EVALUATION AND DSICUSSION
- System Testing Evaluation
- Testing Setup and Result
- Web Scraping and Data Analysis Algorithm Testing Result
- API Testing Result
- Regular Checking Testing Result
- Web Application Testing Result
- Project Challenges
- Resolved Project Challenges
- Unresolved Project Challenges
- Objectives Evaluation
- Legal and Ethical Issues
There are few APIs that are implemented in the system, the details of which are described in chapter 5.3.2. After the API updater error has been called, the error is logged in the database as shown in the figure above. The list of search results is then used to render the user interface as depicted in Figure 6.21.
The circled steri icon in figure 6.24 and figure 6.25 illustrates whether the program has been added to the user's favorites list. If the program does not exist in the user's favorites list, the steri icon is not filled as in figure 6.25; If the steri icon will be filled as in figure 6.24, the program is in the user's favorite list. Programs in the favorites list can be found in "favourtie.html" as shown as in the circled part of the figure below.
Based on the figure, it can be read that UTAR's Bachelor of Computer Science (Honours) program selected in the search results list of figure 6.11 and the program viewed in figure 6.13 is the most recent program viewed by the user. When there is an error while retrieving or displaying the program information, the web application will redirect the client to the error page as shown in figure 6.30. It may need to be scraped by Selenium library due to some programs in the search result list, which the library is a heavy library.
However, the system will still run slowly if there is a case that most of the program information in the search result list is required to be retrieved by Selenium.
CONCLUSION AND RECOMMENDATION
Conclusion on Project Achievements
If the user is interested in the program, the user can download the information from the program and apply the program directly by redirecting the user to the university's admission webpage. In addition, if there is an error in displaying the program information, a log record will also be recorded. The system can also record and display the favorite list and viewed history list of users.
Nevertheless, many improvements are possible to improve the functional and non-functional features of the system, which will be discussed later.
Recommendation
Two more universities not in the QS rankings were scraped and analyzed to increase the range of universities of the web application. Search and filter function is done for both frontend and backend parts - Able to record if any errors exist. SELF-ASSESSMENT OF THE PROGRESS - Implementation of regular check function done - Refinement on algorithm done.
Note: Supervisor/candidate(s) are expected to provide a soft copy of the complete set of originality report to the Faculty/Institute. Based on the above results, I hereby declare that I am satisfied with the originality of the Final Year Project Report submitted by my student(s) as mentioned above.