E2E GenAI application with Langchain, Ray, Flask API backend, React frontend
In this tutorial you will deploy a end-to-end application that will use GenAI model from Hugging Face on the backend, Ray Serve for inference, Flask API backend, and simple React frontend.
Before you begin
Create or select an existing GCP project and open Cloud Shell. You can use these steps
Infrastructure Installation
- If needed,
git clone https://github.com/GoogleCloudPlatform/ai-on-gke.git
- Create a new GKE cluster and install the kuberay operator
- cd
gke-platform
- Edit
variables.tf
with your GCP settings. Make sure you changeproject_id
andcluster_name
. - Run
terraform init
- Run
terraform apply
- cd
- Configure credentials to point to the cluster:
gcloud container clusters get-credentials <cluster-name> --location=<region>
- Install Ray on GKE
- cd
ray-on-gke/user
- Edit variables.tf with your GCP settings. Make sure you set:
project_id
,namespace
,service_account
. - Note the namespace setting. All microservices in this sample will be deployed to this same namespace for simplicity.
- Run
terraform init
- Run
terraform apply
- cd
- Install Jupyter on GKE. These steps are needed to experimentation (see the section below). You can skip it if you want to go straight to building the application.
- cd
jupyter-on-gke
- Edit
variables.tf
with your GCP settings. Make sure that you setproject_id
,project_number
,namespace
. Use the same namespace as above. - Configure higher resource limits and guarantees. In
jupyter_config/config.yaml
change the following:yaml singleuser: cpu: limit: 1 guarantee: .5 memory: limit: 4G guarantee: 1G
- Run terraform init
- Run terraform apply
- cd
Experimentation
Experiment with the model in Jupyter Notebook:
1. Get the address of your Jupyter hub:
kubectl get service proxy-public -n <namespace name> -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
1. Configure IAP and open Jupyter Hub by following the steps in here.
1. From JupyterHub open this notebook: https://raw.githubusercontent.com/GoogleCloudPlatform/ai-on-gke/main/tutorials/langchain/nb1.ipynb
and run it step by step
1. The first section shows how to run the model directly
1. The second section shows how to do the same using Ray Serve
.
Build the end-to-end application
-
We used Jupyter Notebook to experiment, but now let's build the Flask backend that calls into
Ray Serve
.- Observe
model.py
: it loads the model and createsRay.Serve
function that usesLangchain
library to run two nested prompts. - Observe
main.py
: it uses Flask framework to create API route that calls intoRay.Serve
endpoint - Containerize and deploy the backend image to the registry. Do these steps from
backend
directory:bash PROJECT_ID=<YOUR_PROJECT_ID> # configure GCR gcloud auth configure-docker # build the image docker build -t hf-lc-ray:latest . # tag the image for GCR docker tag hf-lc-ray:latest gcr.io/${PROJECT_ID}/hf-lc-ray:latest # push the image to GCR docker push gcr.io/${PROJECT_ID}/hf-lc-ray:latest
- Deploy backend to the cluster. Open
src/backend/deploy.yaml
and changePROJECT_ID
to your project (you can also usesed
:sed -i "s/YOUR_PROJECT/${PROJECT_ID}/" src/backend/deploy.yaml
). Then runbash kubectl apply -f deploy.yaml -n <YOUR_NAMESPACE>
- Find backend IP on the services page:
hf-lc-ray-service
:kubectl get service hf-lc-ray-service -n <namespace name> -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
- To test that backend works you can run:
ENDPOINT='http://<IP>/run' curl -XPOST "${ENDPOINT}?text=football"
You will get a response similar to:["a football player is a player who plays for a team","Un joueur de football est un player qui joue pour un \u00e9quipe."]
- Observe
-
Finally, let's deploy React frontend. Note that in a production distributed application, you can use K8s Ingress with routes for
backend
andfrontend
to avoid taking dependcy on the IP, this approach is provided for simplicity.- Update the
API_ENDPOINT
insrc/frontend/src/index.tsx
- Containerize and deploy the frontend image to the registry. Do these steps from
src/frontend
directory:bash PROJECT_ID=<YOUR_PROJECT_ID> # configure GCR gcloud auth configure-docker # build the image docker build -t hf-lc-ray-fe:latest . # tag the image for GCR docker tag hf-lc-ray-fe:latest gcr.io/${PROJECT_ID}/hf-lc-ray-fe:latest # push the image to GCR docker push gcr.io/${PROJECT_ID}/hf-lc-ray-fe:latest
- Deploy frontend to the cluster. Open
src/frontend/deploy.yaml
and changePROJECT_ID
to your project (you can also usesed
:sed -i "s/YOUR_PROJECT/${PROJECT_ID}/" src/frontend/deploy.yaml
). Then runbash kubectl apply -f deploy.yaml -n <YOUR_NAMESPACE>
- Find frontend IP on the services page:
hf-lc-ray-fe-service
: - Click to navigate and give it a try!
- Update the