Multi-Turn Conversation (Chat)¶
Eval Recipe for model migration¶
This Eval Recipe demonstrates how to evaluate a multi-turn conversation (chat) on Gemini 1.0 and Gemini 2.0 using the open source evaluation tool Promptfoo.

-
Use case: multi-turn conversation
-
Evaluation Dataset is based on Multi-turn Prompts Dataset. It includes 5 conversations:
dataset.jsonl. Each record in this file links to a JSON file with the system instruction followed by a few messages from the User and responses from the Assistant. This dataset does not include any ground truth labels. -
Prompt Template located in
prompt_template.txtinjects thechatvariable from our dataset which represents the conversation history. -
promptfooconfig.yamlcontains all Promptfoo configuration:providers: list of models that will be evaluatedprompts: location of the prompt template filetests: location of the dataset filedefaultTest: configures the evaluation metric:type: select-bestauto-rater that decides which of the two models configured above generated the best response.providersconfigures the judge modelvalueconfigures the custom criteria that is evaluated by theselect-bestauto-rater
How to run this Eval Recipe¶
-
Google Cloud Shell is the easiest option as it automatically clones our Github repo:
-
Alternatively, you can use the following command to clone this repo to any Linux environment with configured Google Cloud Environment:
git clone --filter=blob:none --sparse https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples.git && \ cd applied-ai-engineering-samples && \ git sparse-checkout init && \ git sparse-checkout set genai-on-vertex-ai/gemini/model_upgrades && \ git pull origin main cd genai-on-vertex-ai/gemini/model_upgrades -
Install Promptfoo using these instructions.
-
Navigate to the Eval Recipe directory in terminal and run the command
promptfoo eval. -
Run
promptfoo viewto analyze the eval results. You can switch the Display option toShow failures onlyin order to investigate any underperforming prompts.
How to customize this Eval Recipe:¶
- Copy the configuration file
promptfooconfig.yamlto a new folder. - Add your labeled dataset file with JSONL schema similar to
dataset.jsonl. - Save your prompt template to
prompt_template.txtand make sure that the template variables map to the variables defined in your dataset. - That's it! You are ready to run
promptfoo eval. If needed, add alternative prompt templates or additional metrics to promptfooconfig.yaml as explained here.