A Survey on Semantic Role Labeling and Dependency Parsing

Semantic role labeling takes the initial steps in extracting meaning from text by assigning genericlabelsorroles to the words of the text. To aid in semantic extraction, the relationship between words in a text must be understood at the syntactic level. Marking semantic roles is the task of assigning semantic roles to the constituents of a sentence.

It then discusses the different lexical resources that can be used for semantic role labeling.

Semantic Roles

This is now possible due to the development of large semantic databases (corpora) and advances in machine learning algorithms. Semantic role labels are commonly developed using a supervised learning paradigm where a classifier learns to predict role labels based on features extracted from annotated training data. The creation of resources documenting the realization of semantic roles in example sentences such as FrameNet [BFL98b] and PropBank [KP02] has greatly facilitated the development of learning algorithms capable of automatically analyzing the role semantic structure of input statements.

Here, the semantic roles Judge, Evaluator and Reason are specific to the domain of cognition when a judgment is made.

Lexical Resources

Framenet
Propbank
Verbnet
Wordnet

We can see that the framework elements are inherited and become more and more specific. Propbank is a bank of propositions in which sentences are marked with verbal propositions and their arguments. Each verbal class has different thematic roles and certain syntactic constraints that describe their surface behavior.

VerbNets' hierarchical verb classes create a set of possible thematic roles and a set of possible syntactic realizations.

Figure 1.3: Sample domains and Frames in FrameNet

Link Parser based on Link Grammar

Link Grammar

Links and Linking Requirements
Connectors and Formulae
Disjunctive Form
Grammar Rules

D would mean that the linking requirement of the word is that both connectors C and D must be connected to their complementary connectors. Link grammar contains rules that place constraints on the formation of relationships between the words of the given sentence. Satisfaction - The link requirement of each word is met by the links provided. a) Ordering: When going through the left connectors of a formula from left to right, the words they are connected to move from near to far.

As the right connectors of the formula are crossed from left to right, the words they connect to continue from far to near.

MiniPar

Principle-based Parser

Generating Principles

Filtering Principles

Automatic Semantic Role Labeling

Features for frame element labeling
Features for frame element boundary identification
Probability estimation of a single role
Probability estimation of all the roles in the sentence
Generalizing lexical semantics

Sentence type For each component of the sentence, its phrase type can be determined by the constituent parser. This is used as a feature for only NPs, providing a strong indication if used as the subject or object of the verb. Parse Tree Path A parse tree path for a constituent of a sentence is the path of the constituent from the target word in the constituent parse tree, which includes the intermediate nodes and the arrow directions.

Position The position property indicates whether the component appears before or after the predicate that calls the frame. Similar features are also used to identify the frame element boundary, namely the header word, the parse tree path, and the target word. But there was an average of 34 sentences per target word in the Framenet dataset used.

This is because a specific combination of the above six features together with the target word rarely occurs in the data set. To overcome this, call conditional probabilities of the frame element labels given subsets of the above features such as p(role|targetword), p(role|path,targetword) etc. This strategy gives a significant improvement in performance over the baseline approach to directly estimate the conditional probability of the labels given all six features in the conditioning set.

If we assume that the roles of the different phrases are independent of each other, then the probability estimate from the previous section is sufficient to label the roles. But due to the large vocabulary of English, training on all possible head words is unfeasible.

Extensions to Automatic SRL

Other work

Using an unsupervised parser, which generates unlabeled parse tree and POS tag annotation, the algorithm is able to achieve 56% precision on the argument identification task.

Semi-supervised Semantic Role Labeling

Learning Method

In particular, the direct dependencies of the predicate course (eg, blood or weather in Figure 1-8) and their grammatical roles (eg, SUBJ, MOD) are recorded. Preposition nodes are collapsed, i.e. the preposition object and a compound grammatical role are established (such as IOBJ THROUGH, where IOBJ stands for preposition object and THROUGH for the preposition itself). An example of the argument structure information we obtain for predicate flow 1.8 is shown in Table 1.2.

For each verb-evoking frame in the seed corpus, a labeled predicate-argument representation similar to Table 1.2 is created. Since a verb can invoke different frames with different roles and predicate-argument structure, all of the extracted sentences are not suitable examples to add to the train-. The idea used for selection is that verbs that appear in similar syntactic and semantic contexts will behave similarly in how they relate to their arguments.

Estimating the similarity between two predicate-argument structures is equivalent to finding the highest-scoring alignment between them, i.e. given a labeled predicate-argument structure pl with m arguments and an unlabeled predicate-argument structure pu with n arguments, find and score all possible alignments between these arguments. Thus, we select a subset of arguments from the labeled predicate argument structure and map each of the arguments to some argument of the unlabeled predicate argument structure on a one-to-one basis. The goal is then to find an adjustment so that the similarity function is maximized.

The predicate progression has three arguments in the labeled sentence and four in the unlabeled sentence. Each alignment is scored by taking the sum of the similarity scores of the individual alignment pairs (eg between blood and orphan, vein and silent, and so on).

Projecting Annotations

Statistical Method for UNL Relation Label Generation

Feature Generation

Training

Testing

Overview of UNL System at GETA

Universal Word Resources
The Enconversion and Deconversion process
Graph to Tree Conversion
Deployment

Generation: The generation step builds the target multilevel tree and finally the target text. Multi-parent nodes: In this case, the child's connection to one of the parents is reversed. But if in the corresponding undirected graph, there is a closed circuit, a node is duplicated to create an undirected graph.

In the interactive mode for lexical transfer, the possible French equivalents for a UW are displayed and the user has the option to choose from the option. If no equivalent is found, the user can enter one, and the dictionary is automatically populated. The author also talks about the possibilities of applying UNL in speech MT in the future.

Summary

Dependency grammar is a theory that defines how the words in a sentence are connected. The basic idea is that in a sentence all but one word depends on another word. The main verb in the sentence is sleeps, which is independent of all other words and gives the central idea of the sentence.

If the sentence is about sleeping, there must be an agent or a subject of this action. So the word is the agent or the subject sleeping and it depends on the words sleeping. Robinson (1970) [Rob70] described four basic axioms for dependency grammar that govern the well-formedness of dependency structure.

If A is directly dependent on B and some element C intervenes between them (in the linear order of the set), then C is directly dependent on A or B or some other intermediate element. The third axiom is also called the one-head property, which states that each element can have at most one parent. Thus, a dependency structure can be a tree or a graph, and there are different formalisms for both.

Projective and Non-projective dependency structures

Dependency Parsing Techniques

Data-based Dependency Parser

Transition-based dependency parsing

Summary

Description

The annotator in the initial state can vary from a naive system that randomly annotates the text to a manually designed annotation framework. For example, the initial-state annotator for a part-of-speech (POS) tag may assign the same or random POS tag to all words, or it may assign the most common tag for each word. Similarly for syntactic parsing, the initial state annotations can vary from the output of a sophisticated parser to random tree structure with random non-terminal labels.

In addition, the transformations are also arranged in the order in which they can be applied. For example, in the POS labeling example shown earlier, the following transformation can be used. Such a transformation can be applied to the parse tree of the initial state ((John eats) rice) to obtain (John (eats rice)).

The above technique is essentially a greedy algorithm, where at each step we choose the transformation that minimizes the error to the greatest extent. The transformation T2 achieves the greatest reduction in error, so it is chosen as the first transformation followed by T3. It is also important to consider whether, when applying a transformation, one must first identify all trigger environments and then apply rewrite rules, or whether the application can be executed one at a time.

Observations

Statistical Dependency Analysis

Parsing Actions

Right: A dependency is built between nodes, with the left node becoming a child node of the right node. The right and left actions can only be used when all dependencies of the node that becomes a child have been resolved.

Algorithm

Classification

Combination of features learning using polynomial kernel functions: Kernel functions enable SVMs to deal with nonlinear classification. The left context is defined as the indexes on the left side of the target nodes and the right context is defined as the indexes on the right side of the target nodes. The attributes used are of the form (p, k, v) where p indicates the position relative to the target nodes (negative values indicate the left context, positive values indicate the right context), k indicates the type of the attribute, and v returns the value.

The authors note that the task of training the SVM is quadratic to cubic in the number of training samples. Thus, to reduce training costs, the data set is divided into groups based on the POS tag of the left target node. Thus, an SVM is constructed for each POS tag of the left node and the correct SVM is chosen during dependency generation.

Summary

In Proceedings of the Joint Conference of the 47th ACL Annual Meeting and the 4th AFNLP International Joint Conference on Natural Language Processing: Volume 1-Volume 1, pages 28–36. In Proceedings of the 48th Annual Meeting of the Society for Computational Linguistics, pages 968–978.