In [ ]:

Copied!





# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Defining custom attributes based on URL patterns in Vertex AI Search Website Datastores¶

Open in Colab

Open in Colab Enterprise

Open in Workbench

View on GitHub


Author(s)	Hossein Mansour
Reviewers(s)	Ismail Najim, Rajesh Thallam
Last updated	2024-08-09: The first draft

Overview¶

In this notebook, we demonstrate how to create custom attributes based on URL patterns in Vertex AI Search Website datastores.

These custom attributes will act similarly to metadata from page source and can be used for different purposes such as improving recall and precision, influencing results via boosting and filtering, and including additional context to be retrieved together with the documents.

You can find more information about different types of metadata here.

Custom attributes based on URL patterns are particularly helpful in cases where adjusting page source to include relevant information is not feasible due to a need to keep that information private or when organizational complexities make it difficult to influence the page source content (e.g., content being managed by a third party).

Custom attributes can be used, in lieu of page source metadata, in conjunction with page source metadata, or to override poor quality page content via post-processing (e.g., a Title_Override custom attribute to override the actual page title for certain URLs).

Note that basic URL-based boosting and filtering can be done directly. Custom Attributes are intended for more advanced usecases.

If the custom attribute is made searchable, it can be used to implicitly influence retrieval and ranking of the page by providing additional information such as tags and related topics.

We will perform the following steps:

[Prerequisite] Creating a Vertex AI Search Website Datastore and Search App
Setting Schema and URL mapping for Customer Attributes
Getting Schema and URL mapping to confirm this is what we want
Searching the Datastore and demonstrating how custom attributes can be used for filtering
Clean up

Please refer to the official documentation for the definition of Datastores and Apps and their relationships to one another

REST API is used throughout this notebook. Please consult the official documentation for alternative ways to achieve the same goal, namely Client libraries and RPC.

Vertex AI Search¶

Vertex AI Search (VAIS) is a fully-managed platform, powered by large language models, that lets you build AI-enabled search and recommendation experiences for your public or private websites or mobile applications

VAIS can handle a diverse set of data sources including structured, unstructured, and website data, as well as data from third-party applications such as Jira, Salesforce, and Confluence.

VAIS also has built-in integration with LLMs which enables you to provide answers to complex questions, grounded in your data

Using this Notebook¶

If you're running outside of Colab, depending on your environment you may need to install pip packages that are included in the Colab environment by default but are not part of the Python Standard Library. Outside of Colab you'll also notice comments in code cells that look like #@something, these trigger special Colab functionality but don't change the behavior of the notebook.

This tutorial uses the following Google Cloud services and resources:

Service Usage API
Discovery Engine API

This notebook has been tested in the following environment:

Python version = 3.10.12
google.cloud.storage = 2.8.0
google.auth = 2.27.0

Getting Started¶

The following steps are necessary to run this notebook, no matter what notebook environment you're using.

If you're entirely new to Google Cloud, get started here

Google Cloud Project Setup¶

Select or create a Google Cloud project. When you first create an account, you get a $300 free credit towards your compute/storage costs
Make sure that billing is enabled for your project
Enable the Service Usage API
Enable the Cloud Storage API
Enable the Discovery Engine API for your project

Google Cloud Permissions¶

Ideally you should have Owner role for your project to run this notebook. If that is not an option, you need at least the following roles

roles/serviceusage.serviceUsageAdmin to enable APIs
roles/iam.serviceAccountAdmin to modify service agent permissions
roles/discoveryengine.admin to modify discoveryengine assets

Setup Environment¶

Authentication¶

If you're using Colab, run the code in the next cell. Follow the popups and authenticate with an account that has access to your Google Cloud project.

If you're running this notebook somewhere besides Colab, make sure your environment has the right Google Cloud access. If that's a new concept to you, consider looking into Application Default Credentials for your local environment and initializing the Google Cloud CLI. In many cases, running gcloud auth application-default login in a shell on the machine running the notebook kernel is sufficient.

More authentication options are discussed here.

In [ ]:

Copied!





# Colab authentication.
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()
    print("Authenticated")
# Colab authentication.
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()
    print("Authenticated")

In [ ]:

Copied!

from google.auth import default
from google.auth.transport.requests import AuthorizedSession

creds, _ = default()
authed_session = AuthorizedSession(creds)
from google.auth import default
from google.auth.transport.requests import AuthorizedSession

creds, _ = default()
authed_session = AuthorizedSession(creds)

Import Libraries¶

In [ ]:

Copied!

import json
import pprint
import time
import json
import pprint
import time

Configure environment¶

The Location of a Datastore is set at the time of creation and it should be called appropriately to query the Datastore. global is typically recommended unless you have a particular reason to use a regional Datastore.

You can find more information regarding the Location of datastores and associated limitations here.

VAIS_BRANCH is the branch of VAIS to use. At the time of writing this notebook, URL mapping for Custom Attributes is only available in v1alpha of Discovery Engine API.

INCLUDE_URL_PATTERN is the pattern of a website to be included in the datastore, e.g. “www.example.com/”, “www.example.com/abc/”.

Note that you need to verify the ownership of a domain to be able to index it.

In [ ]:

Copied!





PROJECT_ID = '' # @param {type: 'string'}
DATASTORE_ID = '' # @param {type: 'string'}
APP_ID = '' # @param {type: 'string'}
LOCATION = "global"  # @param ["global", "us", "eu"]
VAIS_BRANCH = "v1alpha"  # @param {type: 'string'}
INCLUDE_URL_PATTERN = "" # @param {type: 'string'}
PROJECT_ID = '' # @param {type: 'string'}
DATASTORE_ID = '' # @param {type: 'string'}
APP_ID = '' # @param {type: 'string'}
LOCATION = "global"  # @param ["global", "us", "eu"]
VAIS_BRANCH = "v1alpha"  # @param {type: 'string'}
INCLUDE_URL_PATTERN = "" # @param {type: 'string'}

Step 1. [Prerequisite] Create a Website Search Datastore and APP¶

In this section we will programmatically create a VAIS Advanced Website Datastore and APP. You can achieve the same goal with a few clicks in the UI.

If you already have an Advanced Website Datastore available, you can skip this section.

Helper functions to issue basic search on a Datastore or an App¶

In [ ]:

Copied!





def search_by_datastore(project_id: str, location: str, datastore_id: str, query: str):
    """Searches a datastore using the provided query."""
    response = authed_session.post(
        f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{project_id}/locations/{location}/collections/default_collection/dataStores/{datastore_id}/servingConfigs/default_search:search',
        headers={
            'Content-Type': 'application/json',
        },
        json={
            "query": query,
            "pageSize": 1
        },
    )
    return response

def search_by_app(project_id: str, location: str, app_id: str, query: str):
    """Searches an app using the provided query."""
    response = authed_session.post(
        f'https://discoveryengine.googleapis.com/v1/projects/{project_id}/locations/{location}/collections/default_collection/engines/{app_id}/servingConfigs/default_config:search',
        headers={
            'Content-Type': 'application/json',
        },
        json={
            "query": query,
            "pageSize": 1
        },
    )
    return response
def search_by_datastore(project_id: str, location: str, datastore_id: str, query: str):
    """Searches a datastore using the provided query."""
    response = authed_session.post(
        f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{project_id}/locations/{location}/collections/default_collection/dataStores/{datastore_id}/servingConfigs/default_search:search',
        headers={
            'Content-Type': 'application/json',
        },
        json={
            "query": query,
            "pageSize": 1
        },
    )
    return response

def search_by_app(project_id: str, location: str, app_id: str, query: str):
    """Searches an app using the provided query."""
    response = authed_session.post(
        f'https://discoveryengine.googleapis.com/v1/projects/{project_id}/locations/{location}/collections/default_collection/engines/{app_id}/servingConfigs/default_config:search',
        headers={
            'Content-Type': 'application/json',
        },
        json={
            "query": query,
            "pageSize": 1
        },
    )
    return response

Helper functions to check whether or not a Datastore or an App already exist¶

In [ ]:

Copied!





def datastore_exists(project_id: str, location: str, datastore_id: str) -> bool:
    """Check if a datastore exists."""
    response = search_by_datastore(project_id, location, datastore_id, "test")
    status_code = response.status_code
    # A 400 response is expected as the URL pattern needs to be set first
    if status_code == 200 or status_code == 400:
        return True
    if status_code == 404:
        return False
    raise Exception(f"Error: {status_code}")

def app_exists(project_id: str, location: str, app_id: str) -> bool:
    """Check if an App exists."""
    response = search_by_app(project_id, location, app_id, "test")
    status_code = response.status_code
    if status_code == 200:
        return True
    if status_code == 404:
        return False
    raise Exception(f"Error: {status_code}")
def datastore_exists(project_id: str, location: str, datastore_id: str) -> bool:
    """Check if a datastore exists."""
    response = search_by_datastore(project_id, location, datastore_id, "test")
    status_code = response.status_code
    # A 400 response is expected as the URL pattern needs to be set first
    if status_code == 200 or status_code == 400:
        return True
    if status_code == 404:
        return False
    raise Exception(f"Error: {status_code}")

def app_exists(project_id: str, location: str, app_id: str) -> bool:
    """Check if an App exists."""
    response = search_by_app(project_id, location, app_id, "test")
    status_code = response.status_code
    if status_code == 200:
        return True
    if status_code == 404:
        return False
    raise Exception(f"Error: {status_code}")

Helper functions to create a Datastore or an App¶

In [ ]:

Copied!





def create_website_datastore(vais_branch: str, project_id: str, location: str, datastore_id: str) -> int:
    """Create a website datastore"""
    payload = {
        "displayName": datastore_id,
        "industryVertical": "GENERIC",
        "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
        "contentConfig": "PUBLIC_WEBSITE",
    }
    header = {"X-Goog-User-Project": project_id, "Content-Type": "application/json"}
    es_endpoint = f"https://discoveryengine.googleapis.com/{vais_branch}/projects/{project_id}/locations/{location}/collections/default_collection/dataStores?dataStoreId={datastore_id}"
    response = authed_session.post(es_endpoint, data=json.dumps(payload), headers=header)
    if response.status_code == 200:
        print(f"The creation of Datastore {datastore_id} is initiated.")
        print("It may take a few minutes for the Datastore to become available")
    else:
        print(f"Failed to create Datastore {datastore_id}")
        print(response.text())
    return response.status_code

def create_app(vais_branch: str, project_id: str, location: str, datastore_id: str, app_id: str) -> int:
    """Create a search app."""
    payload = {
        "displayName": app_id,
        "dataStoreIds": [datastore_id],
        "solutionType": "SOLUTION_TYPE_SEARCH",
        "searchEngineConfig": {
            "searchTier": "SEARCH_TIER_ENTERPRISE",
            "searchAddOns": ["SEARCH_ADD_ON_LLM"],
        }
    }
    header = {"X-Goog-User-Project": project_id, "Content-Type": "application/json"}
    es_endpoint = f"https://discoveryengine.googleapis.com/{vais_branch}/projects/{project_id}/locations/{location}/collections/default_collection/engines?engineId={app_id}"
    response = authed_session.post(es_endpoint, data=json.dumps(payload), headers=header)
    if response.status_code == 200:
        print(f"The creation of App {app_id}  is initiated.")
        print("It may take a few minutes for the App to become available")
    else:
        print(f"Failed to create App {app_id}")
        print(response.json())
    return response.status_code
def create_website_datastore(vais_branch: str, project_id: str, location: str, datastore_id: str) -> int:
    """Create a website datastore"""
    payload = {
        "displayName": datastore_id,
        "industryVertical": "GENERIC",
        "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
        "contentConfig": "PUBLIC_WEBSITE",
    }
    header = {"X-Goog-User-Project": project_id, "Content-Type": "application/json"}
    es_endpoint = f"https://discoveryengine.googleapis.com/{vais_branch}/projects/{project_id}/locations/{location}/collections/default_collection/dataStores?dataStoreId={datastore_id}"
    response = authed_session.post(es_endpoint, data=json.dumps(payload), headers=header)
    if response.status_code == 200:
        print(f"The creation of Datastore {datastore_id} is initiated.")
        print("It may take a few minutes for the Datastore to become available")
    else:
        print(f"Failed to create Datastore {datastore_id}")
        print(response.text())
    return response.status_code

def create_app(vais_branch: str, project_id: str, location: str, datastore_id: str, app_id: str) -> int:
    """Create a search app."""
    payload = {
        "displayName": app_id,
        "dataStoreIds": [datastore_id],
        "solutionType": "SOLUTION_TYPE_SEARCH",
        "searchEngineConfig": {
            "searchTier": "SEARCH_TIER_ENTERPRISE",
            "searchAddOns": ["SEARCH_ADD_ON_LLM"],
        }
    }
    header = {"X-Goog-User-Project": project_id, "Content-Type": "application/json"}
    es_endpoint = f"https://discoveryengine.googleapis.com/{vais_branch}/projects/{project_id}/locations/{location}/collections/default_collection/engines?engineId={app_id}"
    response = authed_session.post(es_endpoint, data=json.dumps(payload), headers=header)
    if response.status_code == 200:
        print(f"The creation of App {app_id}  is initiated.")
        print("It may take a few minutes for the App to become available")
    else:
        print(f"Failed to create App {app_id}")
        print(response.json())
    return response.status_code

Create a Datastores with the provided ID if it doesn't exist¶

In [ ]:

Copied!





if datastore_exists(PROJECT_ID, LOCATION, DATASTORE_ID):
    print(f"Datastore {DATASTORE_ID} already exists.")
else:
    create_website_datastore(VAIS_BRANCH, PROJECT_ID, LOCATION, DATASTORE_ID)
if datastore_exists(PROJECT_ID, LOCATION, DATASTORE_ID):
    print(f"Datastore {DATASTORE_ID} already exists.")
else:
    create_website_datastore(VAIS_BRANCH, PROJECT_ID, LOCATION, DATASTORE_ID)

[Optional] Check if the Datastore is created successfully¶

The Datastore is polled to track when it becomes available.

This may take a few minutes

In [ ]:

Copied!





while not datastore_exists(PROJECT_ID, LOCATION, DATASTORE_ID):
    print(f"Datastore {DATASTORE_ID} is still being created.")
    time.sleep(30)
print(f"Datastore {DATASTORE_ID} is created successfully.")
while not datastore_exists(PROJECT_ID, LOCATION, DATASTORE_ID):
    print(f"Datastore {DATASTORE_ID} is still being created.")
    time.sleep(30)
print(f"Datastore {DATASTORE_ID} is created successfully.")

Create an App with the provided ID if it doesn't exist¶

The App will be connected to a Datastore with the ID provided earlier in this notebook

In [ ]:

Copied!





if app_exists(PROJECT_ID, LOCATION, APP_ID):
    print(f"App {APP_ID} already exists.")
else:
    create_app(VAIS_BRANCH, PROJECT_ID, LOCATION, DATASTORE_ID, APP_ID)
if app_exists(PROJECT_ID, LOCATION, APP_ID):
    print(f"App {APP_ID} already exists.")
else:
    create_app(VAIS_BRANCH, PROJECT_ID, LOCATION, DATASTORE_ID, APP_ID)

[Optional] Check if the App is created successfully¶

The App is polled to track when it becomes available.

This may take a few minutes

In [ ]:

Copied!





while not app_exists(PROJECT_ID, LOCATION, APP_ID):
    print(f"App {APP_ID} is still being created.")
    time.sleep(30)
print(f"App {APP_ID} is created successfully.")
while not app_exists(PROJECT_ID, LOCATION, APP_ID):
    print(f"App {APP_ID} is still being created.")
    time.sleep(30)
print(f"App {APP_ID} is created successfully.")

Upgrade an existing Website Datastore to Advanced Website DataStore¶

In [ ]:

Copied!





def upgrade_to_advanced(vais_branch: str, project_id: str, location: str, datastore_id: str) -> int:
    """Upgrade the website search datastore to advanced"""
    header = {"X-Goog-User-Project": project_id}
    es_endpoint = f"https://discoveryengine.googleapis.com/{vais_branch}/projects/{project_id}/locations/{location}/collections/default_collection/dataStores/{datastore_id}/siteSearchEngine:enableAdvancedSiteSearch"
    response = authed_session.post(es_endpoint, headers=header)
    if response.status_code == 200:
        print(f"Datastore {datastore_id} upgraded to Advanced Website Search")
    else:
        print(f"Failed to upgrade Datastore {datastore_id}")
        print(response.text())
    return response.status_code

upgrade_to_advanced(VAIS_BRANCH, PROJECT_ID, LOCATION, DATASTORE_ID)
def upgrade_to_advanced(vais_branch: str, project_id: str, location: str, datastore_id: str) -> int:
    """Upgrade the website search datastore to advanced"""
    header = {"X-Goog-User-Project": project_id}
    es_endpoint = f"https://discoveryengine.googleapis.com/{vais_branch}/projects/{project_id}/locations/{location}/collections/default_collection/dataStores/{datastore_id}/siteSearchEngine:enableAdvancedSiteSearch"
    response = authed_session.post(es_endpoint, headers=header)
    if response.status_code == 200:
        print(f"Datastore {datastore_id} upgraded to Advanced Website Search")
    else:
        print(f"Failed to upgrade Datastore {datastore_id}")
        print(response.text())
    return response.status_code

upgrade_to_advanced(VAIS_BRANCH, PROJECT_ID, LOCATION, DATASTORE_ID)

Set the URLs to Include/Exclude in the Index¶

You can set up to 500 Include and Exclude URL patterns for Advanced website search Datastores.

This function sets a single URL pattern to be included every time it gets executed.

The field type in the payload is used to indicate if the provided Uri pattern should be included or excluded. Here we only use INCLUDE.

The INCLUDE and EXCLUDE URL patters specified with this function are incremental. You also have options to Delete, List, Batch Create, etc

For this example, we index http://cloud.google.com/generative-ai-app-builder/*

Note that you need to verify the ownership of a domain to be able to index it.

In [ ]:

Copied!





def include_url_patterns(vais_branch: str, project_id: str, location: str, datastore_id: str, include_url_patterns) -> int:
    """Set include and exclude URL patterns for the Datastore"""
    payload = {
  "providedUriPattern": include_url_patterns,
  "type": "INCLUDE",
    }
    header = {"X-Goog-User-Project": project_id, "Content-Type": "application/json"}
    es_endpoint = f"https://discoveryengine.googleapis.com/{vais_branch}/projects/{project_id}/locations/{location}/dataStores/{datastore_id}/siteSearchEngine/targetSites"
    response = authed_session.post(es_endpoint, data=json.dumps(payload), headers=header)
    if response.status_code == 200:
        print(f"URL patterns successfully set")
        print("Depending on the size of your domain, the initial indexing may take from minutes to hours")
    else:
        print(f"Failed to set URL patterns for the Datastore {datastore_id}")
        print(response.text())
    return response.status_code

include_url_patterns(VAIS_BRANCH, PROJECT_ID, LOCATION, DATASTORE_ID, INCLUDE_URL_PATTERN)
def include_url_patterns(vais_branch: str, project_id: str, location: str, datastore_id: str, include_url_patterns) -> int:
    """Set include and exclude URL patterns for the Datastore"""
    payload = {
  "providedUriPattern": include_url_patterns,
  "type": "INCLUDE",
    }
    header = {"X-Goog-User-Project": project_id, "Content-Type": "application/json"}
    es_endpoint = f"https://discoveryengine.googleapis.com/{vais_branch}/projects/{project_id}/locations/{location}/dataStores/{datastore_id}/siteSearchEngine/targetSites"
    response = authed_session.post(es_endpoint, data=json.dumps(payload), headers=header)
    if response.status_code == 200:
        print(f"URL patterns successfully set")
        print("Depending on the size of your domain, the initial indexing may take from minutes to hours")
    else:
        print(f"Failed to set URL patterns for the Datastore {datastore_id}")
        print(response.text())
    return response.status_code

include_url_patterns(VAIS_BRANCH, PROJECT_ID, LOCATION, DATASTORE_ID, INCLUDE_URL_PATTERN)

Step 2. Schema and URL mapping for Custom Attributes¶

Set the Schema and URL mapping¶

In this example we use VAIS REST API documentation as the source for the datastore. For the mapping we add "REST" tags to all branches of REST documentation. We also add an additional tag to identify each branch (i.e. V1, V1alpha, V1beta). The schema and URL mapping should follow this formatting.

Separately, we identify pages under Samples with a corresponding tag.

As mentioned above, you can only index a website you own, as a result your mapping will be different from the ones used in this example.

Note that each successful mapping request overrides the previous ones (i.e. mappings are not incremental)

In [ ]:

Copied!





header = {"X-Goog-User-Project": PROJECT_ID}
es_endpoint = f"https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}/siteSearchEngine:setUriPatternDocumentData"
json_data = {
    "documentDataMap": {
        "https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1/*": {
            "Topic": ["Rest", "V1"]
        },
        "https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1alpha/*": {
            "Topic": ["Rest", "V1alpha"]
        },
        "https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1beta/*": {
            "Topic": ["Rest", "V1beta"]
        },
        "https://cloud.google.com/generative-ai-app-builder/docs/samples*": {
            "Topic": ["Samples"]
        },
    },
    "schema": {
        "$schema": "https://json-schema.org/draft/2020-12/schema",
        "properties": {
            "Topic": {
                "items": {
                    "indexable": True,
                    "retrievable": True,
                    "searchable": True,
                    "type": "string",
                },
                "type": "array",
            }
        },
        "type": "object",
    },
}

set_schema_response = authed_session.post(es_endpoint, headers=header, json=json_data)

print(json.dumps(set_schema_response.json(), indent=1))

header = {"X-Goog-User-Project": PROJECT_ID}
es_endpoint = f"https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}/siteSearchEngine:setUriPatternDocumentData"
json_data = {
    "documentDataMap": {
        "https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1/*": {
            "Topic": ["Rest", "V1"]
        },
        "https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1alpha/*": {
            "Topic": ["Rest", "V1alpha"]
        },
        "https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1beta/*": {
            "Topic": ["Rest", "V1beta"]
        },
        "https://cloud.google.com/generative-ai-app-builder/docs/samples*": {
            "Topic": ["Samples"]
        },
    },
    "schema": {
        "$schema": "https://json-schema.org/draft/2020-12/schema",
        "properties": {
            "Topic": {
                "items": {
                    "indexable": True,
                    "retrievable": True,
                    "searchable": True,
                    "type": "string",
                },
                "type": "array",
            }
        },
        "type": "object",
    },
}

set_schema_response = authed_session.post(es_endpoint, headers=header, json=json_data)

print(json.dumps(set_schema_response.json(), indent=1))

Get the Schema and URL mapping¶

Get the Schema and URL mapping to ensure it is updated according to your expectations.

In [ ]:

Copied!

header = {"X-Goog-User-Project": PROJECT_ID}
es_endpoint = f"https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}/siteSearchEngine:getUriPatternDocumentData"
get_schema_response = authed_session.get(es_endpoint, headers=header)

print(json.dumps(get_schema_response.json(), indent=1))
header = {"X-Goog-User-Project": PROJECT_ID}
es_endpoint = f"https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}/siteSearchEngine:getUriPatternDocumentData"
get_schema_response = authed_session.get(es_endpoint, headers=header)

print(json.dumps(get_schema_response.json(), indent=1))

Step 3. Run queries w/wo Metadata filter¶

Search Parameters¶

QUERY: Used to query VAIS.

PAGE_SIZE: The maximum number of results retrieved from VAIS.

In [ ]:

Copied!

QUERY = '' # @param {type: 'string'}
PAGE_SIZE = 5 # @param {type: 'integer'}
QUERY = '' # @param {type: 'string'}
PAGE_SIZE = 5 # @param {type: 'integer'}

Search Without Filter¶

Given that the Topic custom attribute is made retrievable in the Schema, You will get it back in the response, when applicable.

Custom attributes are included in the structData field of the result).

In [ ]:

Copied!





search_response = authed_session.post(
  f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}/servingConfigs/default_search:search',
  headers={
    'Content-Type': 'application/json'
  },
  json={
"query": QUERY,
"pageSize": PAGE_SIZE},
)

print(json.dumps(search_response.json(), indent=1))
search_response = authed_session.post(
  f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}/servingConfigs/default_search:search',
  headers={
    'Content-Type': 'application/json'
  },
  json={
"query": QUERY,
"pageSize": PAGE_SIZE},
)

print(json.dumps(search_response.json(), indent=1))

Search with Filter¶

Now we apply a filter so that a search only returns results from the V1alpha branch of the REST documentation. The filter and expected results will be different based on the domain included in your website datastore.

We could also use this indexable field for other purposes such as Boosting, if desired.

In [ ]:

Copied!





search_response = authed_session.post(
  f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}/servingConfigs/default_search:search',
  headers={
    'Content-Type': 'application/json'
  },
  json={
"query": QUERY,
"filter": "Topic: ANY(\"V1alpha\")",
"pageSize": PAGE_SIZE},
)

print(json.dumps(search_response.json(), indent=1))
search_response = authed_session.post(
  f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}/servingConfigs/default_search:search',
  headers={
    'Content-Type': 'application/json'
  },
  json={
"query": QUERY,
"filter": "Topic: ANY(\"V1alpha\")",
"pageSize": PAGE_SIZE},
)

print(json.dumps(search_response.json(), indent=1))

Clean up¶

Delete the Search App¶

Delete the App if you no longer need it

Alternatively you can follow these instructions to delete an App from the UI

In [ ]:

Copied!





response = authed_session.delete(
f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/engines/{APP_ID}',
  headers={
     "X-Goog-User-Project": PROJECT_ID
  }
    )

print(response.text)
response = authed_session.delete(
f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/engines/{APP_ID}',
  headers={
     "X-Goog-User-Project": PROJECT_ID
  }
    )

print(response.text)

Delete the Datastores¶

Delete the Datastore if you no longer need it

Alternatively you can follow these instructions to delete a Datastore from the UI

In [ ]:

Copied!





response = authed_session.delete(
f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}',
  headers={
     "X-Goog-User-Project": PROJECT_ID
  }
    )

print(response.text)
response = authed_session.delete(
f'https://discoveryengine.googleapis.com/{VAIS_BRANCH}/projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATASTORE_ID}',
  headers={
     "X-Goog-User-Project": PROJECT_ID
  }
    )

print(response.text)