Categories
Functions related to product categorization.
def join_categories(ids: list[str]) -> dict[str:list[str]]
Given list of product IDs, join category names.
Args: ids: list of product IDs used to join against master product table
Returns: dict mapping product IDs to category name. The category name will be a list of strings e.g. [’level 1 category’, ’level 2 category']
def retrieve(desc: str,
image: Optional[str] = None,
base64: bool = False,
filters: list[str] = []) -> list[dict]
Returns list of categories based on nearest neighbors.
This is a ‘greedy’ retrieval approach that embeds the provided desc and (optionally) image and returns the categories corresponding to the closest products in embedding space.
Args: desc: user provided description of product image: can be local file path, GCS URI or base64 encoded image base64: True indicates image is base64. False (default) will be interpreted as image path (either local or GCS) filters: category prefix to restrict results to
Returns: List of candidates sorted by embedding distance. Each candidate is a dict with the following keys: id: product ID category: category in list form e.g. [’level 1 category’, ’level 2 category’] distance: embedding distance in range [0,1], 0 being the closest match
def _rank(desc: str, candidates: list[list[str]]) -> list[list[str]]
See rank() for docstring.
def rank(desc: str, candidates: list[list[str]]) -> list[list[str]]
Use an LLM to rank candidates by description.
Args: desc: user provided description of product candidates: list of categories. Each category is in list form e.g. [’level 1 category’, ’level 2 category’] so it’s a list of lists
Returns: The candidates ranked by the LLM from most to least relevant. If there are duplicate candidates the list is deduped prior to returning
def retrieve_and_rank(desc: str,
image: Optional[str] = None,
base64: bool = False,
filters: list[str] = []) -> m.CategoryList
Wrapper function to sequence retrieve and rank functions.
Args: desc: user provided description of product image: can be local file path, GCS URI or base64 encoded image base64: True indicates image is base64. False (default) will be interpreted as image path (either local or GCS) num_neigbhors: number of nearest neighbors to return for EACH embedding filters: category prefix to restrict results to
Returns: The candidates ranked by the LLM from most to least relevant. If there are duplicate candidates the list is deduped prior to returning