Evaluate Summarization¶
Author(s) | Renato Leite (renatoleite@), Egon Soares (egon@) |
Last updated | 09/05/2023 |
ROUGE-L¶
ROUGE-L uses LCS-based F-measure to estimate the similarity between two summaries X of length m and Y of length n, assuming X is a reference summary sentence and Y is a candidate summary sentence, as follows:
$Recall_{lcs} = \cfrac{LCS(X,Y)}{m}$
$Precision_{lcs} = \cfrac{LCS(X,Y)}{n}$
$F_{lcs} = \cfrac{(1+\beta²)Recall_{lcs} Precision_{lcs}}{\beta²Precision_{lcs}+Recall_{lcs}}$
$\beta = \cfrac{Precision_{lcs}}{Recall_{lcs}}$
$ROUGE-L = \cfrac{(1+(\cfrac{Precision_{lcs}}{Recall_{lcs}})²)Recall_{lcs} Precision_{lcs}}{(\cfrac{Precision_{lcs}}{Recall_{lcs}})²Precision_{lcs}+Recall_{lcs}}$
LCS¶
Size of LCS:
$ LCS(X_i, Y_j) =
String of LCS:
$ LCS(X_i, Y_j) =
$\epsilon \implies \text{empty string}$
$\frown \implies \text{append element}$
In [1]:
Copied!
reference = "police killed the gunman"
candidate = "police kill the gunman"
reference = "police killed the gunman"
candidate = "police kill the gunman"
In [2]:
Copied!
#Recursive LCS
def lcs(X, Y, m, n):
if m == 0 or n == 0:
return 0
elif X[m-1] == Y[n-1]:
return 1 + lcs(X, Y, m-1, n-1)
else:
return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n))
#Recursive LCS
def lcs(X, Y, m, n):
if m == 0 or n == 0:
return 0
elif X[m-1] == Y[n-1]:
return 1 + lcs(X, Y, m-1, n-1)
else:
return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n))
In [3]:
Copied!
def lcs_sequence(X, Y, m, n):
if m == 0 or n == 0:
return []
elif X[m-1] == Y[n-1]:
return lcs_sequence(X, Y, m-1, n-1) + [X[m-1]]
else:
a = lcs_sequence(X, Y, m, n-1)
b = lcs_sequence(X, Y, m-1, n)
return a if len(a) > len(b) else b
def lcs_sequence(X, Y, m, n):
if m == 0 or n == 0:
return []
elif X[m-1] == Y[n-1]:
return lcs_sequence(X, Y, m-1, n-1) + [X[m-1]]
else:
a = lcs_sequence(X, Y, m, n-1)
b = lcs_sequence(X, Y, m-1, n)
return a if len(a) > len(b) else b
In [4]:
Copied!
X = reference.split()
Y = candidate.split()
lcs(X, Y, len(X), len(Y))
X = reference.split()
Y = candidate.split()
lcs(X, Y, len(X), len(Y))
Out[4]:
3
In [5]:
Copied!
" ".join(lcs_sequence(X, Y, len(X), len(Y)))
" ".join(lcs_sequence(X, Y, len(X), len(Y)))
Out[5]:
'police the gunman'
In [6]:
Copied!
# Dynamic Programming LCS
def lcs_dp(X, Y, m, n, dp):
if m == 0 or n == 0:
return 0
elif dp[m][n] != -1:
return dp[m][n]
elif X[m - 1] == Y[n - 1]:
dp[m][n] = 1 + lcs_dp(X, Y, m - 1, n - 1, dp)
return dp[m][n]
dp[m][n] = max(lcs_dp(X, Y, m, n - 1, dp), lcs_dp(X, Y, m - 1, n, dp))
return dp[m][n]
# Dynamic Programming LCS
def lcs_dp(X, Y, m, n, dp):
if m == 0 or n == 0:
return 0
elif dp[m][n] != -1:
return dp[m][n]
elif X[m - 1] == Y[n - 1]:
dp[m][n] = 1 + lcs_dp(X, Y, m - 1, n - 1, dp)
return dp[m][n]
dp[m][n] = max(lcs_dp(X, Y, m, n - 1, dp), lcs_dp(X, Y, m - 1, n, dp))
return dp[m][n]
In [7]:
Copied!
dp = [[-1 for i in range(len(X) + 1)] for j in range(len(Y) + 1)]
lcs_score = lcs_dp(X, Y, len(X), len(Y), dp)
lcs_score
dp = [[-1 for i in range(len(X) + 1)] for j in range(len(Y) + 1)]
lcs_score = lcs_dp(X, Y, len(X), len(Y), dp)
lcs_score
Out[7]:
3
In [8]:
Copied!
r_lcs = lcs_score/len(X)
p_lcs = lcs_score/len(Y)
r_lcs = lcs_score/len(X)
p_lcs = lcs_score/len(Y)
In [9]:
Copied!
r_lcs
r_lcs
Out[9]:
0.75
In [10]:
Copied!
p_lcs
p_lcs
Out[10]:
0.75
In [11]:
Copied!
# Default beta, can be another number to weight between precision and recall
beta = p_lcs / r_lcs
beta
# Default beta, can be another number to weight between precision and recall
beta = p_lcs / r_lcs
beta
Out[11]:
1.0
In [12]:
Copied!
num = (1 + (beta**2)) * r_lcs * p_lcs
denom = r_lcs + ((beta**2) * p_lcs)
rouge_l = num / denom
num = (1 + (beta**2)) * r_lcs * p_lcs
denom = r_lcs + ((beta**2) * p_lcs)
rouge_l = num / denom
In [13]:
Copied!
rouge_l
rouge_l
Out[13]:
0.75
In [14]:
Copied!
def rouge_l(reference, candidate):
X = reference.split()
Y = candidate.split()
m = len(X)
n = len(Y)
if m == 0 or n == 0:
return 0
dp = [[-1 for i in range(n + 1)]for j in range(m + 1)]
lcs_score = lcs_dp(X, Y, m, n, dp)
r_lcs = lcs_score/m
p_lcs = lcs_score/n
epsilon = 1e-12 # Prevents division by 0
r_lcs = epsilon if r_lcs == 0 else r_lcs
beta = p_lcs / (r_lcs + epsilon)
num = (1 + (beta**2)) * r_lcs * p_lcs
denom = r_lcs + ((beta**2) * p_lcs)
denom = epsilon if denom == 0 else denom
return num / denom
def rouge_l(reference, candidate):
X = reference.split()
Y = candidate.split()
m = len(X)
n = len(Y)
if m == 0 or n == 0:
return 0
dp = [[-1 for i in range(n + 1)]for j in range(m + 1)]
lcs_score = lcs_dp(X, Y, m, n, dp)
r_lcs = lcs_score/m
p_lcs = lcs_score/n
epsilon = 1e-12 # Prevents division by 0
r_lcs = epsilon if r_lcs == 0 else r_lcs
beta = p_lcs / (r_lcs + epsilon)
num = (1 + (beta**2)) * r_lcs * p_lcs
denom = r_lcs + ((beta**2) * p_lcs)
denom = epsilon if denom == 0 else denom
return num / denom
In [15]:
Copied!
rouge_l(reference, candidate)
rouge_l(reference, candidate)
Out[15]:
0.75
Google Research Implementation¶
In [16]:
Copied!
!pip install rouge-score
!pip install rouge-score
Requirement already satisfied: rouge-score in ./venv/lib/python3.9/site-packages (0.1.2) Requirement already satisfied: absl-py in ./venv/lib/python3.9/site-packages (from rouge-score) (1.4.0) Requirement already satisfied: nltk in ./venv/lib/python3.9/site-packages (from rouge-score) (3.8.1) Requirement already satisfied: numpy in ./venv/lib/python3.9/site-packages (from rouge-score) (1.25.2) Requirement already satisfied: six>=1.14.0 in ./venv/lib/python3.9/site-packages (from rouge-score) (1.16.0) Requirement already satisfied: regex>=2021.8.3 in ./venv/lib/python3.9/site-packages (from nltk->rouge-score) (2023.8.8) Requirement already satisfied: tqdm in ./venv/lib/python3.9/site-packages (from nltk->rouge-score) (4.66.1) Requirement already satisfied: joblib in ./venv/lib/python3.9/site-packages (from nltk->rouge-score) (1.3.2) Requirement already satisfied: click in ./venv/lib/python3.9/site-packages (from nltk->rouge-score) (8.1.7)
In [17]:
Copied!
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=False)
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=False)
In [18]:
Copied!
scorer.score(reference, candidate)
scorer.score(reference, candidate)
Out[18]:
{'rougeL': Score(precision=0.75, recall=0.75, fmeasure=0.75)}
In [19]:
Copied!
scorer.score('The quick brown fox jumps over the lazy dog',
'The quick brown dog jumps on the log.')
scorer.score('The quick brown fox jumps over the lazy dog',
'The quick brown dog jumps on the log.')
Out[19]:
{'rougeL': Score(precision=0.625, recall=0.5555555555555556, fmeasure=0.5882352941176471)}