Summarization¶
Eval Recipe for model migration¶
This Eval Recipe demonstrates how to compare performance of a summarization prompt with Gemini 1.0 and Gemini 2.0 using Vertex AI Evaluation Service.
-
Use case: summarize a news article.
-
Use case: summarize a news article.
-
The Evaluation Dataset1 includes 5 news articles stored as plain text files, and a JSONL file with ground truth labels:
dataset.jsonl
. Each record in this file includes 2 attributes:document
: relative path to the plain text file containing the news articlereference
: ground truth label (short summary of the article)
-
Prompt Template is a zero-shot prompt located in
prompt_template.txt
with variabledocument
that gets populated from the corresponding dataset attribute. -
Python script
eval.py
configures the evaluation:run_eval
: configures the evaluation task, runs it on the 2 models and prints the results.load_dataset
: loads the dataset including the contents of all documents.
-
Shell script
run.sh
installs the required Python libraries and runseval.py
How to run this Eval Recipe¶
- Configure your Google Cloud Environment and clone this Github repo to your environment. We recommend Cloud Shell or Vertex AI Workbench.
git clone --filter=blob:none --sparse https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples.git && \
cd applied-ai-engineering-samples && \
git sparse-checkout init && \
git sparse-checkout set genai-on-vertex-ai/gemini/model_upgrades && \
git pull origin main
- Navigate to the Eval Recipe directory in terminal, set your Google Cloud Project ID and run the shell script
run.sh
.
cd genai-on-vertex-ai/gemini/model_upgrades/summarization/vertex_script
export PROJECT_ID="[your-project-id]"
./run.sh
- The resulting metrics will be displayed in the script output. You can find the prompts and model responses stored in
candidate_results.metrics_table
. - You can use Vertex AI Experiments to view the history of evaluations for each experiment, including the final metrics scores.
How to customize this Eval Recipe:¶
- Edit the Python script
eval.py
:- set the
project
parameter of vertexai.init to your Google Cloud Project ID. - set the parameter
baseline_model
to the model that is currently used by your application - set the parameter
candidate_model
to the model that you want to compare with your current model - configure a unique
experiment_name
for tracking purposes
- set the
- Replace the contents of
dataset.jsonl
with your custom data in the same format. - Replace the contents of
prompt_template.txt
with your custom prompt template. Make sure that prompt template variables map to the dataset attributes. - Please refer to our documentation if you want to further customize your evaluation. Vertex AI Evaluation Service has a lot of features that are not included in this recipe.
-
Dataset (XSum) citation: @InProceedings{xsum-emnlp, author = {Shashi Narayan and Shay B. Cohen and Mirella Lapata}, title = {Don't Give Me the Details, Just the Summary! {T}opic-Aware Convolutional Neural Networks for Extreme Summarization}, booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing}, year = {2018}, address = {Brussels, Belgium}, } ↩