Ranking
Ranking
Ranking
Whither Machine Learning?
Machine Learning Magic Box
Coke, Pepsi, Sprite Coke > Sprite > Pepsi
Whither Machine Learning?
Machine Learning Magic Box
Coke, Pepsi, Sprite Coke > Sprite > Pepsi
List of Objects Ranking of Objects
Ranking
16
s
n s
s s
o o
o
, 2 ,
1 ,
s
request offerings
ranking(of(offerings
o o oN
O , , ,
2
1
) , (s o
f Objects
Ranking of Objects
Ranking via Machine Learning
Learning(to(Rank
1
Learning( System
Ranking( System
1
Is learning to rank necessary for IR?
• Has been successfully used in web-document search
• Has been the focus of increasing research at the intersection of machine
Example: Machine Translation
138 Re5Ranking(
Model
1 0 0 0 2 1
e e
e
ranked(sentence
candidates(in(target( language
Generative Model
f
sentence(source(language
e
~
re5ranked(top
Example: Collaborative Filtering
I tem1 I tem2 I tem3
...
User1 5 4
User2 1 2 2
... ? ? ?
Example: Collaborative Filtering
I tem1 I tem2 I tem3
...
User1 5 4
User2 1 2 2
... ? ? ?
User M 4 3
Input: User Ratings on some items
Example: Collaborative Filtering
I tem1 I tem2 I tem3
...
User1 5 4
User2 1 2 2
... ? ? ?
User M 4 3
Any user is a query; based on this query,
Example: Key Phrase Extraction
• Input: Document
• Output: Key phrases in document
• Pre-processing step: extraction of possible phrases in document
Other IR applications of learning to rank
• Question Answering
• Document Summarization Opinion Mining
• Sentiment Analysis
• Machine Translation
• Sentence Parsing
Our focus: ranking documents
q
n q
q q
d d
d
, 2 ,
1 ,
q
query documents
ranking(of(documents
d d dN
D 1, 2,,
) , (q d f
ranking(based(on( relevance,(importance,(
Traditional approach to “learning to rank”
Probabilistic(Model
N
d d
d
2 1
q
) ,
| ( ~
) ,
| ( ~
) ,
| ( ~
2 2
1 1
n n P r q d
d
d q r P d
d q r P d
query
documents
ranking(of(documents
} 0 , 1 {
) , | (
R
Modern approach to learning to rank
Learning(to(Rank
1
Learning( System
Ranking( System
Learning to Rank: Training Process
1.(Data(Labeling (rank)
& & % & & $
& & % & & $
& & % & & $
& & % & & $
& & % & & $ #
Learning to Rank: Training Process
1.(Data(Labeling (rank)
2.(Feature( Extraction
& & % & & $
& & % & & $
& & % & & $
& & % & & $
& & % & & $ #
Learning to Rank: Training Process
1.(Data(Labeling
(rank) 3.(Learning
) (x
f
2.(Feature( Extraction
& & % & & $
& & % & & $
& & % & & $
& & % & & $
& & % & & $
& & % & & $
Learning to Rank: Testing Process
1.(Data(Labeling (rank)
& & % & & $
& & % & & $
& & % & & $
Learning to Rank: Testing Process
1.(Data(Labeling (rank)
2.(Feature( Extraction
& & % & & $
& & % & & $
& & % & & $
Learning to Rank: Testing Process
1.(Data(Labeling (rank)
2.(Feature( Extraction
3.((Ranking( with f (x)
& & % & & $
& & % & & $
& & % & & $
Learning to Rank: Testing Process
1.(Data(Labeling (rank)
2.(Feature( Extraction
28
3.((Ranking( with f (x)
& & % & & $
4.((Evaluation
Evaluation Result
& & % & & $
& & % & & $
Issues in learning to rank
• Data Labeling
• Feature Extraction
• Evaluation Measure
Data Labeling
E.g.,(relevance(of(documents(w.r.t.(query
Doc(A
Doc(B
Doc(C
Query
Data Labeling: Supervision takes different forms
• Multiple levels (e.g., relevant, partially relevant, irrelevant)
‣ Widely used in IR
• Ordered Pairs
‣ e.g. ordered pairs between documents (e.g. A>B, B>C)
‣ e.g. implicit relevance judgments derived from click-through data
• Fully ordered list (i.e. a permutation) of documents is given
Data Labeling: Implicit Relevance Judgement
Doc(A
Doc(B
Doc(C
B(>(A ranking(of(documents(at(search(system
users(often(clicked(on(Doc(B
Feature Extraction
35
Doc(A
Doc(B
Doc(C Query
BM25
BM25
BM25
PageRank
PageRank
PageRank
...
...
...
Query5document(feature
Document(feature
Feature Extraction: Example Features
• BM25 (a classical ranking score function)
• Proximity/distance measures between query and document
• Whether query exactly occurs in document
• PageRank
Evaluation Measures
• Allows us to quantify how good the predicted ranking is when compared with ground truth
ranking
• Typical evaluation metrics care more about ranking top results correctly
• Popular Measures
‣ NDCG (Normalized Discounted Cumulative Gain)
‣ MAP (Mean Average Precision)
‣ MRR (Mean Reciprocal Rank)
‣ WTA (Winners Take All)
Kendall’s Tau
-!#
Comparing(agreement(between(two(permutations
Number(of(pairs(
Number(of(concordant(pairs(and(disconcordant pairs(
Completely(agree
Completely(disagree( )
1 (
2 1
n n
Q P
) 1 (
2 1
n n
Q P,
1
-1
Kendall’s Tau
-!#0"-1
Example
A((B((C
C((A((B
3 1
3 2 -1
Learning to Rank Methods
• Three major approaches
‣ Pointwise approach
‣ Pairwise approach
Pointwise Approach
• Transforming ranking to regression, classification, or ordinal regression
Pointwise Approach
Pointwise(Approach
Learning
Regression Classification Ordinal2Regression
Input Feature(vector
Output Real(number Category Ordered category
Model Ranking(function
Loss2
Function Regression(loss Classification loss
Ordinal(regression( loss
x
) (x f
y y sign( f (x)) y thresold( f (x))
Pairwise Approach
• Transforming ranking to pairwise classification
Pairwise Approach
Pairwise(Approach
Learning
Input Ordered(feature(vector(pair
Output Classification on(order(of(vector(pair
Model Ranking function(
Loss2
Function Pairwise classification(loss(
)) (
sign(
,j i j
i f x x
y
) , (xi xj
Ranking function((f (x)
Pairwise Ranking
Converting(document(list(to(document(pairs
Doc(A
Doc(B
Doc(C
Query
Doc(A
Doc(B Doc(C
Query
Doc(A Doc(B
Doc(C Rank(1
Rank(2
Transformed Pairwise Classification Problem
Transformed(Pairwise(Classification(Problem
66
) ;
(x w
f
3
1 x
x
3
2 x
x
2
1 x
x
+1 51
1
2 x
x
2
3 x
x
1
3 x
x
Positive(Examples
Listwise Approach
• Entire list of documents as input-instance
• Query-document group structure is used
• Straightforwardly represents learning to rank problem
Listwise Approach
Learning &2Ranking
Input Feature(vectors
Output Permutation(on(feature(vectors
Model Ranking(function
Loss2Function Listwise loss(function
(ranking(evaluation(measure)
n i i
x } 1
{
x
) (x
f
) )} (
{
sort( f xi ni1