Google - GenAI Product Catalog
GitHub Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Categories

Functions related to product categorization.

logging

re

defaultdict

Optional

m

embeddings

nearest_neighbors

utils

Config

bq_client

llm

category_depth

allow_trailing_nulls

number_of_neighbors

bq

table_product

column_id

column_categories

join_categories

def join_categories(ids: list[str]) -> dict[str:list[str]]

Given list of product IDs, join category names.

Args: ids: list of product IDs used to join against master product table

Returns: dict mapping product IDs to category name. The category name will be a list of strings e.g. [’level 1 category’, ’level 2 category']

retrieve

def retrieve(desc: str,
             image: Optional[str] = None,
             base64: bool = False,
             filters: list[str] = []) -> list[dict]

Returns list of categories based on nearest neighbors.

This is a ‘greedy’ retrieval approach that embeds the provided desc and (optionally) image and returns the categories corresponding to the closest products in embedding space.

Args: desc: user provided description of product image: can be local file path, GCS URI or base64 encoded image base64: True indicates image is base64. False (default) will be interpreted as image path (either local or GCS) filters: category prefix to restrict results to

Returns: List of candidates sorted by embedding distance. Each candidate is a dict with the following keys: id: product ID category: category in list form e.g. [’level 1 category’, ’level 2 category’] distance: embedding distance in range [0,1], 0 being the closest match

_rank

def _rank(desc: str, candidates: list[list[str]]) -> list[list[str]]

See rank() for docstring.

rank

def rank(desc: str, candidates: list[list[str]]) -> list[list[str]]

Use an LLM to rank candidates by description.

Args: desc: user provided description of product candidates: list of categories. Each category is in list form e.g. [’level 1 category’, ’level 2 category’] so it’s a list of lists

Returns: The candidates ranked by the LLM from most to least relevant. If there are duplicate candidates the list is deduped prior to returning

retrieve_and_rank

def retrieve_and_rank(desc: str,
                      image: Optional[str] = None,
                      base64: bool = False,
                      filters: list[str] = []) -> m.CategoryList

Wrapper function to sequence retrieve and rank functions.

Args: desc: user provided description of product image: can be local file path, GCS URI or base64 encoded image base64: True indicates image is base64. False (default) will be interpreted as image path (either local or GCS) num_neigbhors: number of nearest neighbors to return for EACH embedding filters: category prefix to restrict results to

Returns: The candidates ranked by the LLM from most to least relevant. If there are duplicate candidates the list is deduped prior to returning