Adapted from Hang Li, MSR Asia, 09

(1)

Learning to Rank

Pradeep Ravikumar

[email protected] !

CS 371R !

Adapted from Hang Li, MSR Asia, 09

(2)

Ranking

(3)

Ranking

(4)

Ranking

(5)

Whither Machine Learning?

Machine Learning Magic Box

Coke, Pepsi, Sprite Coke > Sprite > Pepsi

(6)

Whither Machine Learning?

Machine Learning Magic Box

Coke, Pepsi, Sprite Coke > Sprite > Pepsi

List of Objects _{Ranking of Objects}

(7)

Ranking

16

s

n s

s s

o o

o

, 2 ,

1 ,

s

request offerings

ranking(of(offerings

o o oN

O _, _, _,

2

1

) , (s o

f Objects

Ranking of Objects

(8)

Ranking via Machine Learning

Learning(to(Rank

1

Learning( System

Ranking( System

1

(9)

Is learning to rank necessary for IR?

• Has been successfully used in web-document search

• Has been the focus of increasing research at the intersection of machine

(10)

Example: Machine Translation

138 Re5Ranking(

Model

1 0 0 0 2 1

e e

e

ranked(sentence

candidates(in(target( language

Generative Model

f

sentence(source(language

e

~

re5ranked(top

(11)

Example: Collaborative Filtering

I tem1 I tem2 I tem3

...

User1 5 4

User2 1 2 2

... ? ? ?

(12)

...

User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Input: User Ratings on some_items

(13)

...

User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Any user is a query; based on this query,

(14)

Example: Key Phrase Extraction

• Input: Document

• Output: Key phrases in document

• Pre-processing step: extraction of possible phrases in document

(15)

Other IR applications of learning to rank

• Question Answering

• Document Summarization Opinion Mining

• Sentiment Analysis

• Machine Translation

• Sentence Parsing

(16)

Our focus: ranking documents

q

n q

q q

d d

d

, 2 ,

1 ,

q

query documents

ranking(of(documents

d d d_N

D ₁, ₂,,

) , (q d f

ranking(based(on( relevance,(importance,(

(17)

Traditional approach to “learning to rank”

Probabilistic(Model

N

d d

d

2 1

q

) ,

| ( ~

) ,

| ( ~

) ,

| ( ~

2 2

1 1

n n P r q d

d

d q r P d

query

documents

ranking(of(documents

} 0 , 1 {

) , | (

R

(18)

Modern approach to learning to rank

Learning(to(Rank

1

Learning( System

Ranking( System

(19)

Learning to Rank: Training Process

1.(Data(Labeling (rank)

& & % & & $

& & % & & $ #

(20)

2.(Feature( Extraction

& & % & & $

& & % & & $ #

(21)

1.(Data(Labeling

(rank) 3.(Learning

) (x

f

& & % & & $

(22)

Learning to Rank: Testing Process

& & % & & $

(23)

& & % & & $

(24)

3.((Ranking( with _f (x)

& & % & & $

(25)

28

3.((Ranking( with _f (x)

& & % & & $

4.((Evaluation

Evaluation Result

& & % & & $

(26)

Issues in learning to rank

• Data Labeling

• Feature Extraction

• Evaluation Measure

(27)

Data Labeling

E.g.,(relevance(of(documents(w.r.t.(query

Doc(A

Doc(B

Doc(C

Query

(28)

Data Labeling: Supervision takes different forms

• Multiple levels (e.g., relevant, partially relevant, irrelevant)

‣ Widely used in IR

• Ordered Pairs

‣ e.g. ordered pairs between documents (e.g. A>B, B>C)

‣ e.g. implicit relevance judgments derived from click-through data

• Fully ordered list (i.e. a permutation) of documents is given

(29)

Data Labeling: Implicit Relevance Judgement

Doc(A

Doc(B

Doc(C

B(>(A ranking(of(documents(at(search(system

users(often(clicked(on(Doc(B

(30)

Feature Extraction

35

Doc(A

Doc(B

Doc(C Query

BM25

PageRank

...

Query5document(feature

Document(feature

(31)

Feature Extraction: Example Features

• BM25 (a classical ranking score function)

• Proximity/distance measures between query and document

• Whether query exactly occurs in document

• PageRank

(32)

Evaluation Measures

• Allows us to quantify how good the predicted ranking is when compared with ground truth

ranking

• Typical evaluation metrics care more about ranking top results correctly

• Popular Measures

‣ NDCG (Normalized Discounted Cumulative Gain)

‣ MAP (Mean Average Precision)

‣ MRR (Mean Reciprocal Rank)

‣ WTA (Winners Take All)

(33)

Kendall’s Tau

-!#

Comparing(agreement(between(two(permutations

Number(of(pairs(

Number(of(concordant(pairs(and(disconcordant pairs(

Completely(agree

Completely(disagree( )

1 (

2 1

n n

Q P

) 1 (

2 1

n n

Q P_,

1

-1

(34)

Kendall’s Tau

-!#0"-1

Example

A((B((C

C((A((B

3 1

3 2 -1

(35)

Learning to Rank Methods

• Three major approaches

‣ Pointwise approach

‣ Pairwise approach

(36)

Pointwise Approach

• Transforming ranking to regression, classiﬁcation, or ordinal regression

(37)

Pointwise Approach

Pointwise(Approach

Learning

Regression Classification Ordinal2Regression

Input Feature(vector

Output Real(number Category Ordered category

Model Ranking(function

Loss2

Function Regression(loss Classification loss

Ordinal(regression( loss

x

) (x f

y y sign( f (x)) y thresold( f (x))

(38)

Pairwise Approach

• Transforming ranking to pairwise classiﬁcation

(39)

Pairwise Approach

Pairwise(Approach

Learning

Input Ordered(feature(vector(pair

Output Classification on(order(of(vector(pair

Model Ranking function(

Loss2

Function Pairwise classification(loss(

)) (

sign(

,j i j

i f x x

y

) , (x_i x_j

Ranking function((f (x)

(40)

Pairwise Ranking

Converting(document(list(to(document(pairs

Doc(A

Doc(B

Doc(C

Query

Doc(A

Doc(B Doc(C

Query

Doc(A Doc(B

Doc(C Rank(1

Rank(2

(41)

Transformed Pairwise Classiﬁcation Problem

Transformed(Pairwise(Classification(Problem

66

) ;

(_x _w

f

3

1 x

x

3

2 x

x

2

1 x

x

+1 51

1

2 x

x

2

3 x

x

1

3 x

x

Positive(Examples

(42)

Listwise Approach

• Entire list of documents as input-instance

• Query-document group structure is used

• Straightforwardly represents learning to rank problem

(43)

Listwise Approach

Learning &2Ranking

Input Feature(vectors

Output Permutation(on(feature(vectors

Model Ranking(function

Loss2Function Listwise loss(function

(ranking(evaluation(measure)

n i i

x } 1

{

x

) (x

f

) )} (

{

sort( f x_i n_i₁