Image-Prompt Alignment¶
Eval Recipe for model migration¶
This Eval Recipe demonstrates how to use a prompt alignment autorater to compare image generation quality of two models (Imagen2 and Imagen3) with the open source evaluation tool Promptfoo.
-
Use case: Image Generation
-
We use an unlabeled dataset with 5 image generation prompts stored in a JSONL file
dataset.jsonl
and JPEG images generated by Imagen2 and Imagen3 based on these prompts. Each record in the dataset includes 3 attributes wrapped in thevars
object so that Promptfoo can inject this data into the autorater prompt.prompt
: full text of the image generation promptimagen2
: local path to the JPG image generated by Imagen 2 based on this promptimagen3
: local path to the JPG image generated by Imagen 3 based on this prompt
-
All instructions for our prompt alignment autorater are stored in
autorater_instructions.txt
. These instructions are imported into the final multiodal prompt templatesprompt_imagen2.yaml
andprompt_imagen3.yaml
that combine the images from our dataset with the autorater instructions. -
promptfooconfig.yaml
contains all configuration:providers
: defines the LLM judge modelprompts
: autorater prompt templates for Imagen2 and Imagen3tests
: location of the dataset filedefaultTest
: loads the autorater instructions into the shared variableautorater_instructions
, and configures the custom prompt alignment metric defined inmetrics.py
. This metric parses the JSON response from our autorater and returns the percentage score along with the list of gaps detected by the autorater (each gap describes a prompt requirement that is not satisfied by the image).
How to run this Eval Recipe¶
-
Google Cloud Shell is the easiest option as it automatically clones our Github repo:
-
Alternatively, you can use the following command to clone this repo to any Linux environment with configured Google Cloud Environment:
git clone --filter=blob:none --sparse https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples.git && \ cd applied-ai-engineering-samples && \ git sparse-checkout init && \ git sparse-checkout set genai-on-vertex-ai/gemini/model_upgrades && \ git pull origin main cd genai-on-vertex-ai/gemini/model_upgrades
-
Install Promptfoo using these instructions.
-
Navigate to the Eval Recipe directory in terminal and run the command
promptfoo eval
. -
Run
promptfoo view
to analyze the eval results. You can switch the Display option toShow failures only
in order to investigate any underperforming prompts.
How to customize this Eval Recipe:¶
- Copy the eval recipe folder (
promptfoo
) to your environment. - Create a list of image generation prompts.
- Use your baseline and candidate models to generate images based on the image generation prompts, and save them to the
images
folder. - Put your image generation prompts into the dataset file
dataset.jsonl
and make sure that each record points to the right images. - That's it! You are ready to run
promptfoo eval
and view the results usingpromptfoo view
.