Image-Prompt Alignment¶
Eval Recipe for model migration¶
This Eval Recipe demonstrates how to use a prompt alignment autorater to compare image generation quality of two models (Imagen2 and Imagen3) with the open source evaluation tool Promptfoo.

-
Use case: Image Generation
-
We use an unlabeled dataset with 5 image generation prompts stored in a JSONL file
dataset.jsonland JPEG images generated by Imagen2 and Imagen3 based on these prompts. Each record in the dataset includes 3 attributes wrapped in thevarsobject so that Promptfoo can inject this data into the autorater prompt.prompt: full text of the image generation promptimagen2: local path to the JPG image generated by Imagen 2 based on this promptimagen3: local path to the JPG image generated by Imagen 3 based on this prompt
-
All instructions for our prompt alignment autorater are stored in
autorater_instructions.txt. These instructions are imported into the final multiodal prompt templatesprompt_imagen2.yamlandprompt_imagen3.yamlthat combine the images from our dataset with the autorater instructions. -
promptfooconfig.yamlcontains all configuration:providers: defines the LLM judge modelprompts: autorater prompt templates for Imagen2 and Imagen3tests: location of the dataset filedefaultTest: loads the autorater instructions into the shared variableautorater_instructions, and configures the custom prompt alignment metric defined inmetrics.py. This metric parses the JSON response from our autorater and returns the percentage score along with the list of gaps detected by the autorater (each gap describes a prompt requirement that is not satisfied by the image).
How to run this Eval Recipe¶
-
Google Cloud Shell is the easiest option as it automatically clones our Github repo:
-
Alternatively, you can use the following command to clone this repo to any Linux environment with configured Google Cloud Environment:
git clone --filter=blob:none --sparse https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples.git && \ cd applied-ai-engineering-samples && \ git sparse-checkout init && \ git sparse-checkout set genai-on-vertex-ai/gemini/model_upgrades && \ git pull origin main cd genai-on-vertex-ai/gemini/model_upgrades -
Install Promptfoo using these instructions.
-
Navigate to the Eval Recipe directory in terminal and run the command
promptfoo eval. -
Run
promptfoo viewto analyze the eval results. You can switch the Display option toShow failures onlyin order to investigate any underperforming prompts.
How to customize this Eval Recipe:¶
- Copy the eval recipe folder (
promptfoo) to your environment. - Create a list of image generation prompts.
- Use your baseline and candidate models to generate images based on the image generation prompts, and save them to the
imagesfolder. - Put your image generation prompts into the dataset file
dataset.jsonland make sure that each record points to the right images. - That's it! You are ready to run
promptfoo evaland view the results usingpromptfoo view.