• Tidak ada hasil yang ditemukan

PDF Sentence Compressor

N/A
N/A
Protected

Academic year: 2023

Membagikan "PDF Sentence Compressor"

Copied!
49
0
0

Teks penuh

This is to certify that I am responsible for the work presented in this project, that the original work is my own, except as specified in the references and acknowledgments, and that the original work contained herein was not undertaken or done by unspecified sources or persons . People will try to find the right articles, magazines or web pages related to their task. As a solution, a desktop application called Sentence Compressor has been developed to compress long articles.

The BLEU score for Sentence Compressor-compressed articles and human-compressed articles is compared. Foong Oi Mean for her exemplary guidance, monitoring and constant encouragement throughout this Final Year Project. Also, I want to thank my classmates for the valuable information and assistance they provided.

CHAPTER 1: INTRODUCTION

Background of Study

Problem Statements

Significant of the Project

Objectives of the Project

To develop a desktop application that reduces long sentences to shorter ones without changing the original meaning. The second objective of this project is to develop a desktop application that can shorten the long article by reducing the length of long sentences. This application will make the job of selecting journals and research papers more convenient.

Scope of Study

Relevancy of the Project

Feasibility of the Project

Technical Feasibility

Operational Feasibility

Schedule Feasibility

Research Gap

CHAPTER 2: LITERATURE REVIEW AND THEORY

Summarized Texts and Readability

Natural Language Processing

Sentence Compression

A phrase is removed only if it is not grammatically obligatory, is not in focus of the local context, and has a reasonable probability of deletion, which is estimated from the parallel corpus. Much research on sentence compression relies on corpus data to model compression. These approaches consist of a language model which is used to guarantee that the compression output is grammatically correct, a channel model to capture the probability that the source sentence is an expansion of the target compression, and a decoder which is used to search for the compression that maximizes the language model and the channel model.

For example, in [28] they used decision tree model where the compression is performed deterministically by a tree rewriting process. This approach searches for the compression with the highest score using dynamic programming algorithm. Finding the best compression for each sentence by listing all possible compression can become intractable for too many constraints and too long sentences [26].

The sentence compression approach proposed by [26], which takes into account constraints on the compression output, allows finding the optimal solution. So instead of using dynamic programming to decode the best solution, [26] the Integer Linear Programming [ILP] technique was used. ILP is used to make a final decision that is consistent with the constraints and likely according to the classifier.

CHAPTER 3: METHODOLOGY

Linear Programming (LP)

The production for tables and chairs must not exceed 6 labor hours, and must not use more than 45 square feet of wood. The amount of chair and table must be more than zero because it is impossible to create a negative number of tables and chairs. From the above calculation, to maximize the profit of 41.25 GBP, Telfa should produce 3.75 tables and 2.25 tables.

However, this is impossible as the company cannot expect people to buy fractional parts of tables and chairs.

Integer Linear Programming (ILP)

In this project, the ILP problems only take arbitrary values ​​that are limited to 0 and 1. So instead of giving the non-integer value, the ILP gives the integer value that helps solve the example problem mentioned earlier. The number of tables and chairs to produce to maximize profit will be in whole numbers instead of fractions.

Sentence Compressor Framework

  • Keywords Extraction
  • Calculate Objective Function
  • Add Constraints
  • Compress Sentence

If the sentence contains irrelevant word, the clause containing that irrelevant word will be deleted. The frequency of occurrence for each word will be calculated to determine whether the word is important or not. Each will be assigned with binary value, either 1 or 0 based on the condition defined so that the objective function can be calculated in the next phase.

This equation calculates the sum of with the probability (according to the language model) of the word to start the compression, added to the sum of with the probability of the words appearing in the compression and added to the sum of with the. This phase uses IBM ILOG CPLEX Optimization Studio and CPLEX library to provide analytical decision support. The trial version limits the calculation to only a certain number of problems and constraints.

In this phase, constraints are added to the calculation made in IBM ILOG CPLEX Optimization Studio. At this stage, IBM ILOG CPLEX Optimization Studio will return the decision in binary for each word. A word with a score of 0 will be deleted from the sentence, while a word with a score of 1 will remain in the sentence.

Project Method and Activities

Project Activities

Current problems and limitations in searching for information from websites, journals and scientific articles are identified.

Key Milestones

Gantt Chart

CHAPTER 4: RESULTS AND DISCUSSION

Human Summary Procedure

To achieve the objective of shortening the sentence, this project will refer to the basic sentence construction of English grammar. A compound sentence consists of two or more simple sentences that are combined using a conjunction such as 'and', 'or' and 'but'. So this sentence consists of more than one independent clause connected by using the conjunction.

For example, "The woman is reading a book while the man is drinking a cup of coffee" can stand alone as "The woman is reading a book. In this project, we will trim the unimportant clause by adding the important keywords of the sentence track. The woman is reading a book while drinking a cup of coffee at the park near her house".

This is the most challenging part; to reduce the sentence as we need to come up with a specific rule to reduce this sentence without any grammatical errors.

Project Prototype

After clicking the "Import File" option, a file selector will appear allowing the user to select a text file containing the sentence to be compressed from any folder. Sentence Compressor will automatically extract the sentences from the selected text file and display the contents in the Original text area column. The compressed sentences are displayed in the Compressed text area column as shown in figure 4.5.

Users can read the information for each parsed sentence in the article by clicking on . Besides that, the user can also see the information about the compressed sentence separately with additional information.

FIGURE 4.2: User Clicks Import File Option
FIGURE 4.2: User Clicks Import File Option

Experiment

In this experiment, Bilingual Evaluation Understudy (BLEU) is used to measure the similarity of candidate and reference texts. First, the similarity of the articles compressed by man is measured against the gold standard. After that, the similarity of the articles compressed by the Sentence Compressor is measured with the gold standard.

The higher the BLEU score, the higher the similarity of the text to the referenced text (gold standard). From the results produced, the compressed articles produced by Sentence Compressor are less similar to the gold standard compared to the human-compressed article. It can be assumed that the output from the sentence compressor is less accurate compared to the output from human compression.

A lot of improvements need to be made to reduce the score differences between these two types of output.

FIGURE 4.8: BLEU Scores
FIGURE 4.8: BLEU Scores

System Performance Evaluation

Respondents were given three statements that had to be evaluated with five response options, which are strongly agree, agree, neutral, disagree, strongly disagree. 70% of respondents strongly agree and agree that the compressed sentence increases the readability of the text. Only two people from the respondents strongly disagree that the Sentence Compressor increases the readability of the text.

65% of users strongly agree and agree that it is easy to find the important keywords from the shortened sentence. 65% of users strongly agree and agree that this application is useful in searching for information. Based on these evaluation results, it is concluded that Sentence Compressor is useful in information search because it can shorten the time needed to find the most important information.

Most of the users agreed that this application increases the readability of the text and makes the task of keyword searching easier.

Figure  4.13  shows  the  results  of  the  survey  for  20  respondents.  70%  of  the  respondents  strongly  agree  and  agree  that  the  compressed  sentence  increase  the  readability of the text
Figure 4.13 shows the results of the survey for 20 respondents. 70% of the respondents strongly agree and agree that the compressed sentence increase the readability of the text

CHAPTER 5: CONCLUSION AND RECOMMENDATION

Hema, “Natural Language Processing,” in Proceeding of the National Seminar on Artificial Intelligence, Khentawas, Farrukh Nagar, 2009, fq. 2012 IEEE Fourth International Conference on Technology for Education, pp. Semantic Graph Reduction Approach for Abstractive Text Summarization” in Computer Engineering & Systems (ICCES), 2012 Seventh International Conference, pp.

Swarm Based Text Summarization” i 2009 International Association of Computer Science and Information Technology – Spring Conference, s. Automatic Text Summarization Based On Sentences Clustering And Extraction” i Computer Science and Information Technology, 2009. Automatic Text Categorization and Summarization using Rule Reduction” i International konference om fremskridt inden for teknik, videnskab og ledelse (ICAESM -2012), s.

Fuzzy Genetic Semantic Based Text Summarization" in 2011 IEEE Ninth International Conference on Dependables, Autonomic and Secure Computing, p. Automatic Text Summarization Based On Rhetorical Structure Theory" in 2010 International Conference on Computer Applications and System Modeling (ICCASM 2010), p. Keyphrase Extraction for N-best Reranking and Multi-Sentence Compression” in 2013 Conference of the North American Section of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), p.

Global Inference for Sentence Compression An Integer Linear Programming Approach” in Journal of Artificial Intelligence Research, vol. Summarization beyond sentence extraction: a probabilistic approach to sentence compression” in Journal of Artificial Intelligence, vol. Sentence reduction for automatic text summarization” in Proceedings of the 6th Applied Natural Language Processing Conference, Seattle, WA, USA, pp.

Supervised and unsupervised learning for sentence compression" i Proceedings of the 43rd Annual Meeting of Association for Computational Linguistics, Ann Arbor, MI, USA, pp.

Gambar

FIGURE 3.1: Sentence Compressor System Framework  Each of the phases is explained in the following subsections
FIGURE 3.2: Prototyping Methodology
TABLE 3.1: Key Milestone
Figure 4.1 shows the main section of the Sentence Compressor. The top text area will  display the extracted sentence from the text file
+7

Referensi

Dokumen terkait

LIST OF FIGURES Figure 5-5 Figure 6-1 Figure 6-2 Figure 6-3 Figure 6-4 Figure 6-5 Figure 7-1 Figure A-I Figure B-1 Figure B-2 Figure B-3 Figure B-4 Figure B-5 Figure 8-6

iv | P a g e List of Figures Figures Page Figure 1: Study site ………3 Figure 2: Pakchong grass ………..4 Figure 3: German grass ……….4 Figure 4: Para grass ………..4 Figure 5: