POS tAGGING and HMM

(1)

Tim Teks Mining

(2)

(3)

What is Part-of-Speech (POS)

• _{Generally speaking, Word Classes (=POS) :}

• _{Verb, Noun, Adjective, Adverb, Article,}_…

• _{We can also include inflection:}

• _{Verbs: Tense, number,}_…

(4)

Parts of Speech

• _{8 (ish) traditional parts of speech}

• _{Noun, verb, adjective, preposition, adverb, article, interjection,}

pronoun, conjunction, etc

• _{Called: parts-of-speech, lexical categories, word classes,}

morphological classes, lexical tags...

• _{Lots of debate within linguistics about the number, nature, and}

universality of these

(5)

7 Traditional POS Categories

• _N _noun _{chair, bandwidth, pacing} • _V _verb _{study, debate, munch} • _{ADJ adj} _{purple, tall, ridiculous}

• _{ADV adverb} _{unfortunately, slowly,} • _P _preposition _{of, by, to}

• _{PRO pronoun} _{I, me, mine}

(6)

POS Tagging

• _{The process of assigning a part-of-speech or lexical}

class marker to each word in a collection.

WORD tag

the DET

koala N

put V

the DET

keys N

on P

the DET

(7)

Penn TreeBank POS Tag Set

• _{Penn Treebank: hand-annotated corpus of}_{Wall Street}

Journal, 1M words

• _{46 tags}

• _{Some particularities:}

• _to_{/TO not disambiguated}

(8)

(9)

Why POS tagging is useful?

• _{Speech synthesis:}

• _{How to pronounce}“_lead”_? • _INsult _inSULT

• _OBject _obJECT • _{OVERflow overFLOW}

• _DIScount _disCOUNT • _CONtent _conTENT

• _{Stemming for information retrieval}

• _{Can search for “aardvarks” get “aardvark”}

• _{Parsing and speech recognition and etc}

• _{Possessive pronouns (my, your, her) followed by nouns}

• _{Personal pronouns (I, you, he) likely to be followed by verbs}

• _{Need to know if a word is an N or V before you can parse} • _{Information extraction}

• _{Finding names, relations, etc.}

(10)

Open and Closed Classes

• _{Closed class: a small fixed membership} • _{Prepositions: of, in, by, …}

• _{Auxiliaries: may, can, will had, been, …} • _{Pronouns: I, you, she, mine, his, them, …}

• _Usually_{function words}_{(short common words which play a} role in grammar)

(11)

Open Class Words

• _Nouns

• _{Proper nouns (Boulder, Granby, Eli Manning)} • _{English capitalizes these.}

• _{Common nouns (the rest).} • _{Count nouns and mass nouns}

• _{Count: have plurals, get counted: goat/goats, one goat, two goats} • Mass: don’t get counted (snow, salt, communism) (*two snows)

• _{Adverbs: tend to modify things}

• Unfortunately, John walked home extremely slowly yesterday • _{Directional/locative adverbs (here,home, downhill)}

• _{Degree adverbs (extremely, very, somewhat)} • _{Manner adverbs (slowly, slinkily, delicately)}

• _Verbs

(12)

Closed Class Words

Examples:

• _{prepositions:}_{on, under, over,}_… • _particles:_{up, down, on, off, …} • _determiners:_{a, an, the, …} • _pronouns:_{she, who, I, ..}

• _{conjunctions:}_{and, but, or, …}

(13)

(14)

(15)

(16)

POS Tagging

Choosing a Tagset

• _{There are so many parts of speech, potential distinctions we can}

draw

• _{To do POS tagging, we need to choose a standard set of tags to}

work with

• _{Could pick very coarse tagsets} • _{N, V, Adj, Adv.}

• _{More commonly used set is finer grained, the “Penn TreeBank}

tagset”, 45 tags

• _{PRP$, WRB, WP$, VBG}

(17)

Using the Penn Tagset

• _{The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT}

number/NN of/IN other/JJ topics/NNS ./.

• _{Prepositions and subordinating conjunctions marked IN}

(“although/IN I/PRP..”)

• _{Except the preposition/complementizer “to” is just marked}

(18)

POS Tagging

• _{Words often have more than one POS:}_back

• _The_back_{door = JJ} • _{On my}_back_{= NN}

• _{Win the voters}_back_{= RB}

• _{Promised to}_back_{the bill = VB}

• _{The POS tagging problem is to determine the POS tag for}

a particular instance of a word.

(19)

(20)

Current Performance

• _{How many tags are correct?}

• _{About 97% currently}

• _{But baseline is already 90%} • _{Baseline algorithm:}

• _{Tag every word with its most frequent tag} • Tag unknown words as nouns

(21)

Quick Test: Agreement?

• _{the students went to class} • _{plays well with others}

• _{fruit flies like a banana}

DT: the, this, thatNN: noun

VB: verb

(22)

Quick Test

• _{the students went to class}

DT NN VB P NN

• _{plays well with others}

VB ADV P NN NN NN P DT

• _{fruit flies like a banana}

(23)

1960 1970 1980 1990 2000

Brown Corpus Created (EN-US)

1 Million Words

Brown Corpus Tagged

HMM Tagging (CLAWS) 93%-95% Greene and Rubin

Rule Based - 70%

LOB Corpus Created (EN-UK)

1 Million Words

DeRose/Church Efficient HMM

Sparse Data 95%+

British National Corpus

(tagged by CLAWS) POS Tagging

separated from other NLP

Transformation Based Tagging

(Eric Brill) Rule Based – 95%+

Tree-Based Statistics (Helmut Shmid) Rule Based – 96%+ Neural Network

96%+ Trigram Tagger

(Kempe) 96%+

Combined Methods 98%+

Penn Treebank Corpus (WSJ, 4.5M) LOB Corpus

(24)

Two Methods for POS Tagging

1. Rule-based tagging

• _(ENGTWOL)

2. Stochastic

1. Probabilistic sequence models

• _{HMM (Hidden Markov Model) tagging}

(25)

Rule-Based Tagging

• _{Start with a dictionary}

• _{Assign all possible tags to words from the dictionary} • _{Write rules by hand to selectively remove tags}

(26)

Rule-based taggers

• _{Early POS taggers all hand-coded}

• _{Most of these (Harris, 1962; Greene and Rubin, 1971)}

and the best of the recent ones, ENGTWOL (Voutilainen, 1995) based on a two-stage architecture

• _{Stage 1: look up word in lexicon to give list of potential POSs} • _{Stage 2: Apply rules which certify or disallow tag sequences} • _{Rules originally handwritten; more recently Machine}

(27)

Start With a Dictionary

• she: PRP

• promised: VBN,VBD

• to TO

• back: VB, JJ, RB, NN

• the: DT

• bill: NN, VB

(28)

Assign Every Possible Tag

NN RB

VBN JJ VB

PRPVBD TO VB DT NN

(29)

Write Rules to Eliminate Tags

Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP”

NN RB

JJ VB

PRPVBD TO VB DT NN

She promised to back the bill

(30)

POS tagging

The involvement of ion channels in B and T lymphocyte activation is DT NN IN NN NNS IN NN CC NN NN NN VBZ

supported by many reports of changes in ion fluxes and membrane VBN IN JJ NNS IN NNS IN NN NNS CC NN

………. ……….

Machine Learning Algorithm

training

We demonstrate that …

Unseen text

We demonstrate PRP VBP

(31)

31/39

_{We want the best set of tags for a sequence of words (a}

sentence)

_{W — a sequence of words} _{T — a sequence of tags}

 _Example:

(32)

But, the Sparse Data Problem …

• _{Rich Models often require vast amounts of data}

• _{Count up instances of the string "heat oil in a large pot" in}

the training corpus, and pick the most common tag assignment to the string..

(33)

POS Tagging as Sequence Classification

• _{We are given a sentence (an “observation” or “sequence of}

observations”)

• _{Secretariat is expected to race tomorrow}

• _{What is the best sequence of tags that corresponds to this}

sequence of observations?

• _{Probabilistic view:}

• _{Consider all possible sequences of tags}

• _{Out of this universe of sequences, choose the tag sequence which is}

(34)

Getting to HMMs

• We want, out of all sequences of n tags t₁…t_n the single tag sequence

such that P(t₁…t_n|w₁…w_n) is highest.

• _{Hat ^ means “our estimate of the best one”}

(35)

Getting to HMMs

• _{This equation is guaranteed to give us the best tag}

sequence

• _{But how to make it operational? How to compute this}

value?

• _{Intuition of Bayesian classification:}

• _{Use Bayes rule to transform this equation into a set of other}

(36)

36/39

Bayes’ Theorem (1763)

posterior

posterior priorprior

likelihood

marginal likelihood

Reverend Thomas Bayes — Presbyterian minister (1702-1761)

(37)

37/39

_{P(W|T) and P(T) can be counted from a large}

(38)

38/39

_{Assume each word in the sequence depends only on}

its corresponding tag:

(39)

 _{Make a Markov assumption and use N-grams over}

tags ...

 _{P(T) is a product of the probability of N-grams that make}

it up

 _{Make a Markov assumption and use N-grams over} tags ...

 _{P(T) is a product of the probability of N-grams that make} it up

Count P(T)

(40)

Part-of-speech tagging with Hidden Markov

Models

words tags

output probability transition probability

(41)

Analyzing

(42)

A Simple POS HMM

start 0.8 noun verb end

0.2

0.8

0.7 0.1

0.2

0.1

(43)

P ( word | state )

• _{A two-word language:}_“_fish_”_and_“_sleep_” • _{Suppose in our training corpus,}

• _“_fish_”_{appears 8 times as a noun and 5 times as a verb}

• _“_sleep_”_{appears twice as a noun and 5 times as a verb}

• _{Emission probabilities:}

• _Noun

• P(fish | noun) : 0.8

• P(sleep | noun) : 0.2

• _Verb

• P(fish | verb) : 0.5

(44)

Viterbi Probabilities

0

1

2

3 start

verb

noun

(45)

start 0.8 noun 0.8 verb 0.7 end

0.2

0.1 _0.1

0

1

2

3 start

1 verb

0 noun

0

(46)

0.2

0.1 _0.1

Token 1: fish

0

1

2

3 start

1

0 verb

0 .2 * .5

noun

0 .8 * .8

(47)

0.2

0.1 _0.1

Token 1: fish

0

1

2

3 start

1

0 verb

0 .1

noun

0 .64

(48)

0.2

0.1 _0.1

Token 2: sleep

(if ‘fish’ is verb)

0

1

2

3 start

1

0

0 verb

0 .1

.1*.1*.5

noun

0 .64

.1*.2*.2

(49)

-start 0.8 noun 0.8 verb 0.7 end

0.2

0.1 _0.1

Token 2: sleep

(if ‘fish’ is verb)

0

1

2

3 start

1

0

0 verb

0 .1

.005

noun

0 .64

.004

(50)

0.2

0.1 _0.1

Token 2: sleep

(if ‘fish’ is a noun)

0

1

2

3 start

1

0

0 verb

0 .1

.005

.64*.8*.5

noun

0 .64

.004 .64*.1*.2

(51)

0.2

0.1 _0.1

Token 2: sleep

(if ‘fish’ is a noun)

0

1

2

3 start

1

0

0 verb

0 .1

.005

.256

noun

0 .64

.004 .0128

(52)

0.2

0.1 _0.1

Token 2: sleep take maximum,

set back pointers

0

1

2

3 start

1

0

0 verb

0 .1

.005

.256

noun

0 .64

.004 .0128

(53)

0.2

0.1 _0.1

Token 2: sleep take maximum,

set back pointers

0

1

2

3 start

1

0

0 verb

0 .1

.256

noun

0 .64

.0128

(54)

0.2

0.1 _0.1

Token 3: end

0

1

2

3 start

1

0

0 verb

0 .1

.256

-noun

0 .64

.0128

(55)

0.2

0.1 _0.1

Token 3: end take maximum,

set back pointers

0

1

2

3 start

1

0

0 verb

0 .1

.256

-noun

0 .64

.0128

(56)

0.2

0.1 _0.1

Decode: fish = noun

sleep = verb

0

1

2

3 start

1

0

0 verb

0 .1

.256

-noun

0 .64

.0128

(57)

START END

Transition Probability

(58)

Exercise

(59)

POS taggers

• _Brill’_{s tagger}

• _{http://www.cs.jhu.edu/~brill/} • _{TnT tagger}

• _{http://www.coli.uni-saarland.de/~thorsten/tnt/} • _{Stanford tagger}

• _{http://nlp.stanford.edu/software/tagger.shtml} • _SVMTool

• _{http://www.lsi.upc.es/~nlp/SVMTool/} • _{GENIA tagger}

• _{http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/}

• _{More complete list at:}