Attributes

Functions related to attribute generation.

json

logging

Optional

m

embeddings

utils

nearest_neighbors

Config

bq_client

llm

join_attributes_desc

def join_attributes_desc(ids: list[str]) -> dict[str:dict]

Gets the attributes and description for given product IDs.

Args: ids: The product IDs to get the attributes for.

Returns dict mapping product IDs to attributes and descriptions. Each ID will map to a dict with the following keys: attributes: e.g. {‘color’:‘green’, ‘pattern’: striped} description: e.g. ‘This is a description’

retrieve

def retrieve(desc: str,
             category: Optional[str] = None,
             image: Optional[str] = None,
             base64: bool = False,
             filters: list[str] = []) -> list[dict]

Returns list of attributes based on nearest neighbors.

Embeds the provided desc and (optionally) image and returns the attributes corresponding to the closest products in embedding space.

Args: desc: user provided description of product category: category of the product image: can be local file path, GCS URI or base64 encoded image base64: True indicates image is base64. False (default) will be interpreted as image path (either local or GCS) filters: category prefix to restrict results to

Returns: List of candidates sorted by embedding distance. Each candidate is a dict with the following keys: id: product ID attributes: attributes in dict form e.g. {‘color’:‘green’, ‘pattern’: ‘striped’} description: string describing product distance: embedding distance in range [0,1], 0 being the closest match

generate_prompt

def generate_prompt(desc: str, candidates: list[dict]) -> str

Populate LLM prompt template.

Args: desc: product description candidates: list of dicts with the following keys: attributes: attributes in dict form e.g. {‘color’:‘green’, ‘pattern’: ‘striped’} description: string describing product

Returns: prompt to feed to LLM

parse_answer

def parse_answer(ans: str) -> dict[str, str]

Translate LLM response into dict.

Args: ans: ‘|’ separated key value pairs e.g. ‘color:red|size:large’ Returns: ans as a dictionary

generate_attributes

def generate_attributes(desc: str, candidates: list[dict]) -> m.AttributeValue

Use an LLM to determine attributes given nearest neighbor candidates

Returns: attributes in dict form e.g. {‘color’:‘green’, ‘pattern’: ‘striped’}

retrieve_and_generate_attributes

def retrieve_and_generate_attributes(
        desc: str,
        category: Optional[str] = None,
        image: Optional[str] = None,
        base64: bool = False,
        filters: list[str] = []) -> m.ProductAttributes

RAG approach to generating product attributes.

Since LLM answers are not always well formatted, if we fail to parse the LLM answer we fall back to a greedy retrieval approach.

Returns: attributes in dict form e.g. {‘color’:‘green’, ‘pattern’: ‘striped’}