Lexicon Classes

This paper takes a deep dive into a particular area of the interdisciplinary field of computational linguistics, part-of-speech tagging algorithms. The author draws primarily on scientific papers in Computer Science and Linguistics to describe previous approaches to this task and the often hypothesized existence of the asymptotic accuracy rate of about 98% with which this task is assumed to be associated.

An Introduction to Part of Speech Tagging

In the Natural Language Processing (NLP) domain, the function of an efficient PoS tagging algorithm is equally important, if not more important. This is precisely the argument the author hopes to prove and the PoS tagging algorithm he hopes to construct.

A Soft Introduction to Statistical and Machine Learning

More technically, a statistical learning model may have a number of parameters, one or more of which serve as references to the raw input itself, and others are used to mathematically optimize the model around the input during training. In other words, the statistician or mathematician first trains the model with data whose desired class is known to the model, so that the model can gradually adjust its other parameters in a way that slowly.

Common Threads in Existing Methodologies

The reason many machine learning-based PoS tags for the same number of tags is because the number of tags must be equal to the number of tags in the training data. Even the best and most accurate of PoS markers in existence, since they rely on statistical learning models, use either the Brown Corpus or PTB.

Various Machine Learning and Mixed Approaches

He explains that the “hidden” part of the HMM in the context of POS tagging refers to the hidden states of the real tag sequences that the model wants to reveal. Initial state probabilities, or a vector to quantify the probability of the first hidden tag in a sentence, and is usually simply the statistically most likely tag if it has no predecessors; Training the model with learning data produces parameter estimation and is a visible Markov process, because the visible words and underlying POS sequences are observed at the time.

This will be almost as high as the accuracy of the PoS token designed by Stanford's Christopher Manning. Researcher Jun Wu of John Hopkins University even built a complex mixed model HMM to be used for PoS tagging, then used various other tags to learn the errors of the original model and apply corrections using their own form and apply rules [Wu 1997] . While such statistical approaches are implicitly complicated for the nontechnical reader, the Viterbi algorithm can fortunately be broken down in plain English.

Descaling can then be performed based on the weights of previous layers of nodes and gradients calculated via the partial derivative with respect to each node's respective contribution to the overall error. During many such iterations, the number of which is determined by the learning rate, the error becomes smaller and smaller until the weights are optimized for the classification of the training data - results.

Critique of Former Methodologies: The Need for Linguistics

Even an adult with the sharpest eye in the world will have trouble correctly labeling each of the 128 colors. Again, assume that a word x exists, except that this word is not contained in any of the training data fed as initial input to the statistical learning algorithm. This means that, in addition to the many other errors that plague PoS tagging algorithms that rely on statistical learning models, some of the training data is used to obtain initial probabilities.

It is hoped to achieve PoS tagging that more closely resembles human intuition and reflects deep knowledge of the English language. As explained in Chapter 1, one of the first things a programmer who wants to build a PoS tag must do is define his or her tag pool. As a result, as explained in Chapter 3, this is one of the few common threads that work across all PoS tagging algorithms.

For now, the focus will be on the amount of tag set. This completes the set of tags that will be used for the PoS tag based on the rules in this document, and their corresponding definitions and abbreviations that will be used to output the actual program. The algorithm will take a sentence as input and output a sequence of the above abbreviations, which can be used in the rest of this paper to refer to their corresponding parts of speech.

This point of view reveals the angle that the size or magnification of the tag set is not only inversely proportional to the accuracy of the PoS tag, but that it has a similar relationship with.

Establishing Methodology

For example, if shouldn't is in the sentence, should and n't are separated into their own signs and marked as [MV.] and [NEG.] respectively. The second step requires a simple String comparison of each word in the input sentence with preprocessed, inputted lists of words with only one, known part of speech. Still, one should look out here for "the crux of ambiguity" in English, described by Dennison [2012].

Therefore, any noun or verb included in the lexicon for comparison must be ambiguous or incapable of being used as any word class based on the context of a sentence. The third step in the process is almost identical to the last step in that it compares its string to an already built and simply imported lexicon. However, what is being compared this time are the derivational affixes—or prefixes and suffixes, if any, attached to each word in the introductory sentence—with another lexicon filled with hundreds of these affixes and their marked parts of speech that each fix forces the stem it is attached to to transform into.

By using lists of derivational affixes that force their words into specific word classes, scanning the composition of each word in the input sentence to check for these affixes becomes incredibly useful in PoS tagging. These rules are built around trying to label the unknown word class of the word x in the input sentence using the resolved parts-of-speech lexemes, and sometimes the lexemes themselves, surrounding x as rule parameters.

Design Specifications and Implementation

Perhaps the single most important data member in tuning the algorithm is the state of each word, initially set to "unchecked" after instantiation. A full explanation of the variety, complexity and scope of these rules would warrant another theoretical discussion, infinitely long outside the scope of this paper. Another simple example, taken from Payne [2010], is that if a word is preceded by the lexeme many, there is nearly a 99% probability that the word part of the word in question is an adjective - since many is an intensifier.

Instead, the rules will be implemented in nested if-else statements inside a while loop that controls the constant repetition and application of the rules in the sentence in an attempt to resolve all parts of speech. Since there is no feasible way to prioritize and order the application of all rules in a manner that applies to all sentence structures, multiple iterations and application of all rules before all may be necessary. For this reason, the termination condition for the outer while loop and the continuation of the last phase of the algorithm on the input statement is not only (1): That the state of all words is "known", but also (2): That all rules have been tried twenty times according to free estimates.

If the algorithm terminates with condition (2), there will be word(s) whose part(s) are unknown and an appropriate error message will be displayed to help. This continuous change of this proposed algorithm to improve its accuracy substantially to eventually achieve a fully functional and reliable PoS tagging algorithm over time.

Measuring Accuracy and Success

Perhaps most importantly, the first reason why the proposed, implemented algorithm cannot be tested at the moment is related to the nature of the available corpora, which has already been pointed out numerous times and even described as the gold standard by. The first problem is that the size of the tag sets of these corpora and the size of the algorithm proposed in this paper do not match. That is, due to the method of increasing accuracy by reducing the size of the tag set, the number of tags in the corpora available for testing accuracy is at least three times the number of tags that this paper's algorithm tags for .

This artificial decrease in the algorithm's accuracy would also be exacerbated by the previously discussed fact that corpora such as the Brown Corpus are full of errors or incorrect annotations. Such a task would not only be incredibly tedious, but would also take so long that it would conflict with the author's academic timeline governing the completion of this thesis and a functional implementation of the algorithm described therein. The second reason why the algorithm should not currently be tested for accuracy as a means of confirming or refuting the argument in this article lies in the nature of the algorithm itself.

In the relatively young study of linguistics, little attention has been paid to the fact that the abstraction of word class or part of speech in English is deeply rooted in the complex syntax and semantics of the language. While Firth's article is almost ancient compared to the timeline of linguistics studies, Terence Parsons [1990] of MIT would later revive the idea of the nature of lexical categories in his study of "subatomic semantics", described in an article in which he, although successfully linking part of speech words in any given sentence with his.