Deploying Kubeflow cluster
8 minute read
This guide describes how to use kubectl
and kpt to
deploy Kubeflow on Google Cloud.
Deployment steps
Prerequisites
Before installing Kubeflow on the command line:
-
You must have created a management cluster and installed Config Connector.
-
If you don’t have a management cluster follow the instructions
-
Your management cluster will need a namespace setup to administer the Google Cloud project where Kubeflow will be deployed. This step will be included in later step of current page.
-
-
You need to use Linux or Cloud Shell for ASM installation. Currently ASM installation doesn’t work on macOS because it comes with an old version of bash.
-
Make sure that your Google Cloud project meets the minimum requirements described in the project setup guide.
-
Follow the guide setting up OAuth credentials to create OAuth credentials for Cloud Identity-Aware Proxy (Cloud IAP).
- Unfortunately Google Kubernetes Engine’s BackendConfig currently doesn’t support creating IAP OAuth clients programmatically.
Install the required tools
-
Install gcloud.
-
Install gcloud components
gcloud components install kubectl kustomize kpt anthoscli beta gcloud components update
You can install specific version of kubectl by following instruction (Example: Install kubectl on Linux). Latest patch version of kubectl from
v1.17
tov1.19
works well too.Note: Starting from Kubeflow 1.4, it requires
kpt v1.0.0-beta.6
or above to operate ingooglecloudplatform/kubeflow-distribution
repository. gcloud hasn’t caught up with this kpt version yet, install kpt separately from https://github.com/GoogleContainerTools/kpt/tags for now. Note that kpt requires docker to be installed.Note: You also need to install required tools for ASM installation tool
install_asm
.
Fetch googlecloudplatform/kubeflow-distribution and upstream packages
-
If you have already installed Management cluster, you have
googlecloudplatform/kubeflow-distribution
locally. You just need to runcd kubeflow
to access Kubeflow cluster manifests. Otherwise, you can run the following commands:# Check out the latest Kubeflow git clone https://github.com/googlecloudplatform/kubeflow-distribution.git cd kubeflow-distribution git checkout master
Alternatively, you can get the package by using
kpt
:# Check out the latest Kubeflow kpt pkg get https://github.com/googlecloudplatform/kubeflow-distribution.git@master kubeflow-distribution cd kubeflow-distribution
-
Run the following command to pull upstream manifests from
kubeflow/manifests
repository.# Visit Kubeflow cluster related manifests cd kubeflow bash ./pull-upstream.sh
Environment Variables
Log in to gcloud. You only need to run this command once:
gcloud auth login
-
Review and fill all the environment variables in
kubeflow-distribution/kubeflow/env.sh
, they will be used bykpt
later on, and some of them will be used in this deployment guide. Review the comment inenv.sh
for the explanation for each environment variable. After defining these environment variables, run:source env.sh
-
Set environment variables with OAuth Client ID and Secret for IAP:
export CLIENT_ID=<Your CLIENT_ID> export CLIENT_SECRET=<Your CLIENT_SECRET>
Note
Do not omit the export because scripts triggered by make need these environment variables. Do not check in these two environment variables configuration to source control, they are secrets.
kpt setter config
Run the following commands to configure kpt setter for your Kubeflow cluster:
bash ./kpt-set.sh
Everytime you change environment variables, make sure you run the command above to apply kpt setter change to all packages. Otherwise, kustomize build will not be able to pick up new changes.
Note, you can find out which setters exist in a package and their current values by running the following commands:
kpt fn eval -i list-setters:v0.1 ./apps
kpt fn eval -i list-setters:v0.1 ./common
You can learn more about list-setters
in kpt documentation.
Authorize Cloud Config Connector for each Kubeflow project
In the Management cluster deployment we created the Google Cloud service account serviceAccount:kcc-${KF_PROJECT}@${MGMT_PROJECT}.iam.gserviceaccount.com
this is the service account that Config Connector will use to create any Google Cloud resources in ${KF_PROJECT}
. You need to grant this Google Cloud service account sufficient privileges to create the desired resources in Kubeflow project.
You only need to perform steps below once for each Kubeflow project, but make sure to do it even when KF_PROJECT and MGMT_PROJECT are the same project.
The easiest way to do this is to grant the Google Cloud service account owner permissions on one or more projects.
-
Set the Management environment variable if you haven’t:
MGMT_PROJECT=<the project where you deploy your management cluster> MGMT_NAME=<the kubectl context name for management cluster>
-
Apply ConfigConnectorContext for
${KF_PROJECT}
in management cluster:make apply-kcc
Configure Kubeflow
Make sure you are using KF_PROJECT in the gcloud CLI tool:
gcloud config set project ${KF_PROJECT}
Deploy Kubeflow
To deploy Kubeflow, run the following command:
make apply
-
If deployment returns an error due to missing resources in
serving.kserve.io
API group, rerunmake apply
. This is due to a race condition between CRD and runtime resources in KServe.- This issue is being tracked in googlecloudplatform/kubeflow-distribution#384
-
If resources can’t be created because
webhook.cert-manager.io
is unavailable wait and then rerunmake apply
- This issue is being tracked in kubeflow/manifests#1234
-
If resources can’t be created with an error message like:
error: unable to recognize ".build/application/app.k8s.io_v1beta1_application_application-controller-kubeflow.yaml": no matches for kind "Application" in version "app.k8s.io/v1beta1”
This issue occurs when the CRD endpoint isn’t established in the Kubernetes API server when the CRD’s custom object is applied. This issue is expected and can happen multiple times for different kinds of resource. To resolve this issue, try running
make apply
again.
Check your deployment
Follow these steps to verify the deployment:
-
When the deployment finishes, check the resources installed in the namespace
kubeflow
in your new cluster. To do this from the command line, first set yourkubectl
credentials to point to the new cluster:gcloud container clusters get-credentials "${KF_NAME}" --zone "${ZONE}" --project "${KF_PROJECT}"
Then, check what’s installed in the
kubeflow
namespace of your Google Kubernetes Engine cluster:kubectl -n kubeflow get all
Access the Kubeflow user interface (UI)
To access the Kubeflow central dashboard, follow these steps:
-
Use the following command to grant yourself the IAP-secured Web App User role:
gcloud projects add-iam-policy-binding "${KF_PROJECT}" --member=user:<EMAIL> --role=roles/iap.httpsResourceAccessor
Note, you need the
IAP-secured Web App User
role even if you are already an owner or editor of the project.IAP-secured Web App User
role is not implied by theProject Owner
orProject Editor
roles. -
Enter the following URI into your browser address bar. It can take 20 minutes for the URI to become available:
https://${KF_NAME}.endpoints.${KF_PROJECT}.cloud.goog/
You can run the following command to get the URI for your deployment:
kubectl -n istio-system get ingress NAME HOSTS ADDRESS PORTS AGE envoy-ingress your-kubeflow-name.endpoints.your-gcp-project.cloud.goog 34.102.232.34 80 5d13h
The following command sets an environment variable named
HOST
to the URI:export HOST=$(kubectl -n istio-system get ingress envoy-ingress -o=jsonpath={.spec.rules[0].host})
Notes:
- It can take 20 minutes for the URI to become available. Kubeflow needs to provision a signed SSL certificate and register a DNS name.
- If you own or manage the domain or a subdomain with Cloud DNS then you can configure this process to be much faster. Check kubeflow/kubeflow#731.
Understanding the deployment process
This section gives you more details about the kubectl, kustomize, config connector configuration and deployment process, so that you can customize your Kubeflow deployment if necessary.
Application layout
Your Kubeflow application directory kubeflow-distribution/kubeflow
contains the following files and
directories:
-
Makefile is a file that defines rules to automate deployment process. You can refer to GNU make documentation for more introduction. The Makefile we provide is designed to be user maintainable. You are encouraged to read, edit and maintain it to suit your own deployment customization needs.
-
apps, common, contrib are a series of independent components directory containing kustomize packages for deploying Kubeflow components. The structure is to align with upstream kubeflow/manifests.
-
googlecloudplatform/kubeflow-distribution repository only stores
kustomization.yaml
andpatches
for Google Cloud specific resources. -
./pull_upstream.sh
will pullkubeflow/manifests
and store manifests inupstream
folder of each component in this guide. googlecloudplatform/kubeflow-distribution repository doesn’t store the copy of upstream manifests.
-
-
build is a directory that will contain the hydrated manifests outputted by the
make
rules, each component will have its own build directory. You can customize the build path when callingmake
command.
Source Control
It is recommended that you check in your entire local repository into source control.
Checking in build is recommended so you can easily see differences by git diff
in manifests before applying them.
Google Cloud service accounts
The kfctl deployment process creates three service accounts in your Google Cloud project. These service accounts follow the principle of least privilege. The service accounts are:
${KF_NAME}-admin
is used for some admin tasks like configuring the load balancers. The principle is that this account is needed to deploy Kubeflow but not needed to actually run jobs.${KF_NAME}-user
is intended to be used by training jobs and models to access Google Cloud resources (Cloud Storage, BigQuery, etc.). This account has a much smaller set of privileges compared toadmin
.${KF_NAME}-vm
is used only for the virtual machine (VM) service account. This account has the minimal permissions needed to send metrics and logs to Stackdriver.
Upgrade Kubeflow
Refer to Upgrading Kubeflow cluster.
Next steps
- Run a full ML workflow on Kubeflow, using the end-to-end MNIST tutorial or the GitHub issue summarization Pipelines example.
- Learn how to delete your Kubeflow deployment using the CLI.
- To add users to Kubeflow, go to a dedicated section in Customizing Kubeflow on Google Cloud.
- To taylor your Kubeflow deployment on Google Cloud, go to Customizing Kubeflow on Google Cloud.
- For troubleshooting Kubeflow deployments on Google Cloud, go to the Troubleshooting deployments guide.
Feedback
Was this page helpful?
Thank you! Send your feedback to us.
Sorry to hear that. Please tell us how we can improve.