Image-Prompt Alignment¶
Eval Recipe for model migration¶
This Eval Recipe demonstrates how to use a prompt alignment autorater to compare image generation quality of two models (Imagen2 and Imagen3) with the open source evaluation tool Promptfoo.

- 
Use case: Image Generation 
- 
We use an unlabeled dataset with 5 image generation prompts stored in a JSONL file dataset.jsonland JPEG images generated by Imagen2 and Imagen3 based on these prompts. Each record in the dataset includes 3 attributes wrapped in thevarsobject so that Promptfoo can inject this data into the autorater prompt.- prompt: full text of the image generation prompt
- imagen2: local path to the JPG image generated by Imagen 2 based on this prompt
- imagen3: local path to the JPG image generated by Imagen 3 based on this prompt
 
- 
All instructions for our prompt alignment autorater are stored in autorater_instructions.txt. These instructions are imported into the final multiodal prompt templatesprompt_imagen2.yamlandprompt_imagen3.yamlthat combine the images from our dataset with the autorater instructions.
- 
promptfooconfig.yamlcontains all configuration:- providers: defines the LLM judge model
- prompts: autorater prompt templates for Imagen2 and Imagen3
- tests: location of the dataset file
- defaultTest: loads the autorater instructions into the shared variable- autorater_instructions, and configures the custom prompt alignment metric defined in- metrics.py. This metric parses the JSON response from our autorater and returns the percentage score along with the list of gaps detected by the autorater (each gap describes a prompt requirement that is not satisfied by the image).
 
How to run this Eval Recipe¶
- 
Google Cloud Shell is the easiest option as it automatically clones our Github repo: 
- 
Alternatively, you can use the following command to clone this repo to any Linux environment with configured Google Cloud Environment: git clone --filter=blob:none --sparse https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples.git && \ cd applied-ai-engineering-samples && \ git sparse-checkout init && \ git sparse-checkout set genai-on-vertex-ai/gemini/model_upgrades && \ git pull origin main cd genai-on-vertex-ai/gemini/model_upgrades
- 
Install Promptfoo using these instructions. 
- 
Navigate to the Eval Recipe directory in terminal and run the command promptfoo eval.
- 
Run promptfoo viewto analyze the eval results. You can switch the Display option toShow failures onlyin order to investigate any underperforming prompts.
How to customize this Eval Recipe:¶
- Copy the eval recipe folder (promptfoo) to your environment.
- Create a list of image generation prompts.
- Use your baseline and candidate models to generate images based on the image generation prompts, and save them to the imagesfolder.
- Put your image generation prompts into the dataset file dataset.jsonland make sure that each record points to the right images.
- That's it! You are ready to run promptfoo evaland view the results usingpromptfoo view.
