Evaluate Summarization¶
Author(s) | Renato Leite (renatoleite@), Egon Soares (egon@) |
Last updated | 09/05/2023 |
ROUGE-L¶
ROUGE-L uses LCS-based F-measure to estimate the similarity between two summaries X of length m and Y of length n, assuming X is a reference summary sentence and Y is a candidate summary sentence, as follows:
$Recall_{lcs} = \cfrac{LCS(X,Y)}{m}$
$Precision_{lcs} = \cfrac{LCS(X,Y)}{n}$
$F_{lcs} = \cfrac{(1+\beta²)Recall_{lcs} Precision_{lcs}}{\beta²Precision_{lcs}+Recall_{lcs}}$
$\beta = \cfrac{Precision_{lcs}}{Recall_{lcs}}$
$ROUGE-L = \cfrac{(1+(\cfrac{Precision_{lcs}}{Recall_{lcs}})²)Recall_{lcs} Precision_{lcs}}{(\cfrac{Precision_{lcs}}{Recall_{lcs}})²Precision_{lcs}+Recall_{lcs}}$
LCS¶
Size of LCS:
$ LCS(X_i, Y_j) = \begin{cases} 0 & \quad \text{if } i=0 \text{ or } j=0 \\ LCS(X_{i-1}, Y_{j-1}) + 1 & \quad \text{if } i,j>0 \text{ and } x_i=y_j \\ max\left\{LCS(X_i, Y_{j-1}),LCS(X_{i-1}, Y_j)\right\} & \quad \text{if } i,j>0 \text{ and } x_i \neq y_j \end{cases} $
String of LCS:
$ LCS(X_i, Y_j) = \begin{cases} \epsilon & \quad \text{if } i=0 \text{ or } j=0 \\ LCS(X_{i-1}, Y_{j-1})\frown x_i & \quad \text{if } i,j>0 \text{ and } x_i=y_j \\ max\left\{LCS(X_i, Y_{j-1}),LCS(X_{i-1}, Y_j)\right\} & \quad \text{if } i,j>0 \text{ and } x_i \neq y_j \end{cases} $
$\epsilon \implies \text{empty string}$
$\frown \implies \text{append element}$
reference = "police killed the gunman"
candidate = "police kill the gunman"
#Recursive LCS
def lcs(X, Y, m, n):
if m == 0 or n == 0:
return 0
elif X[m-1] == Y[n-1]:
return 1 + lcs(X, Y, m-1, n-1)
else:
return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n))
def lcs_sequence(X, Y, m, n):
if m == 0 or n == 0:
return []
elif X[m-1] == Y[n-1]:
return lcs_sequence(X, Y, m-1, n-1) + [X[m-1]]
else:
a = lcs_sequence(X, Y, m, n-1)
b = lcs_sequence(X, Y, m-1, n)
return a if len(a) > len(b) else b
X = reference.split()
Y = candidate.split()
lcs(X, Y, len(X), len(Y))
3
" ".join(lcs_sequence(X, Y, len(X), len(Y)))
'police the gunman'
# Dynamic Programming LCS
def lcs_dp(X, Y, m, n, dp):
if m == 0 or n == 0:
return 0
elif dp[m][n] != -1:
return dp[m][n]
elif X[m - 1] == Y[n - 1]:
dp[m][n] = 1 + lcs_dp(X, Y, m - 1, n - 1, dp)
return dp[m][n]
dp[m][n] = max(lcs_dp(X, Y, m, n - 1, dp), lcs_dp(X, Y, m - 1, n, dp))
return dp[m][n]
dp = [[-1 for i in range(len(X) + 1)] for j in range(len(Y) + 1)]
lcs_score = lcs_dp(X, Y, len(X), len(Y), dp)
lcs_score
3
r_lcs = lcs_score/len(X)
p_lcs = lcs_score/len(Y)
r_lcs
0.75
p_lcs
0.75
# Default beta, can be another number to weight between precision and recall
beta = p_lcs / r_lcs
beta
1.0
num = (1 + (beta**2)) * r_lcs * p_lcs
denom = r_lcs + ((beta**2) * p_lcs)
rouge_l = num / denom
rouge_l
0.75
def rouge_l(reference, candidate):
X = reference.split()
Y = candidate.split()
m = len(X)
n = len(Y)
if m == 0 or n == 0:
return 0
dp = [[-1 for i in range(n + 1)]for j in range(m + 1)]
lcs_score = lcs_dp(X, Y, m, n, dp)
r_lcs = lcs_score/m
p_lcs = lcs_score/n
epsilon = 1e-12 # Prevents division by 0
r_lcs = epsilon if r_lcs == 0 else r_lcs
beta = p_lcs / (r_lcs + epsilon)
num = (1 + (beta**2)) * r_lcs * p_lcs
denom = r_lcs + ((beta**2) * p_lcs)
denom = epsilon if denom == 0 else denom
return num / denom
rouge_l(reference, candidate)
0.75
Google Research Implementation¶
!pip install rouge-score
Requirement already satisfied: rouge-score in ./venv/lib/python3.9/site-packages (0.1.2) Requirement already satisfied: absl-py in ./venv/lib/python3.9/site-packages (from rouge-score) (1.4.0) Requirement already satisfied: nltk in ./venv/lib/python3.9/site-packages (from rouge-score) (3.8.1) Requirement already satisfied: numpy in ./venv/lib/python3.9/site-packages (from rouge-score) (1.25.2) Requirement already satisfied: six>=1.14.0 in ./venv/lib/python3.9/site-packages (from rouge-score) (1.16.0) Requirement already satisfied: regex>=2021.8.3 in ./venv/lib/python3.9/site-packages (from nltk->rouge-score) (2023.8.8) Requirement already satisfied: tqdm in ./venv/lib/python3.9/site-packages (from nltk->rouge-score) (4.66.1) Requirement already satisfied: joblib in ./venv/lib/python3.9/site-packages (from nltk->rouge-score) (1.3.2) Requirement already satisfied: click in ./venv/lib/python3.9/site-packages (from nltk->rouge-score) (8.1.7)
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=False)
scorer.score(reference, candidate)
{'rougeL': Score(precision=0.75, recall=0.75, fmeasure=0.75)}
scorer.score('The quick brown fox jumps over the lazy dog',
'The quick brown dog jumps on the log.')
{'rougeL': Score(precision=0.625, recall=0.5555555555555556, fmeasure=0.5882352941176471)}