Manual Deployment¶

This document describes a method to run a demo deployment of the tool.

Objectives¶

Deploy autoscaler application
Run a Demo job for autoscaling Dataproc workers for the Trino workloads
Verify autoscaling of worker nodes in the monitoring and Trino dashboard

Costs¶

This tutorial uses billable components of Google Cloud, including the following:

Use the pricing calculator to generate a cost estimate based on your projected usage.

Before you begin¶

For this tutorial, you need a Google Cloud project. To make cleanup easiest at the end of the tutorial, we recommend that you create a new project for this tutorial.

Create a Google Cloud project
Make sure that billing is enabled for your Google Cloud project
Open Cloud Shell

At the bottom of the Cloud Console, a Cloud Shell session opens and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the gcloud command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.

In Cloud Shell, clone the source repository and go to the directory for this tutorial:

git clone https://github.com/GoogleCloudPlatform/dataproc-trino-autoscaler.git\
cd dataproc-trino-autoscaler

Setting up your environment¶

Enable APIs for Compute Engine, Cloud Storage, Dataproc, Bigquery, Monitoring and Cloud Build services:

gcloud services enable \
compute.googleapis.com \
bigquery.googleapis.com \
bigqueryconnection.googleapis.com \
cloudbuild.googleapis.com \
dataproc.googleapis.com \
storage.googleapis.com \
monitoring.googleapis.com

In Cloud Shell, set the Cloud Region that you want to create your Dataproc resources in:

PROJECT_ID="<PROJECT_ID>"
REGION="<YOUR_REGION>"
SERVICE_ACCOUNT="<SERVICE_ACCOUNT>"

Make sure that the Service Account used by Dataproc cluster, should have the following roles:

gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SERVICE_ACCOUNT} --role=roles/compute.admin && \
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SERVICE_ACCOUNT} --role=roles/bigquery.dataViewer && \
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SERVICE_ACCOUNT} --role=roles/bigquery.user && \
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SERVICE_ACCOUNT} --role=roles/dataproc.editor && \
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SERVICE_ACCOUNT} --role=roles/dataproc.worker && \
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SERVICE_ACCOUNT} --role=roles/monitoring.viewer && \
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member=serviceAccount:${SERVICE_ACCOUNT} --role=roles/storage.objectViewer

How to use¶

The solution is packaged as a JAR file that needs to be executed on the Trino master node as a daemon process.

Build the JAR file
```
./gradlew clean build shadowJar
```
Write your configuration by making a copy of the demo/sample_config.textproto file and update it.

Copy the JAR and config file to the master node of the Dataproc cluster running Trino using gcloud compute scp command.

gcloud compute scp <JAR_FILE_PATH> <CONFIG_FILE_PATH> <MASTER_NODE_NAME>:~/.

Variable	Description
`JAR_FILE_PATH`	full path to the JAR file on your local machine.
`CONFIG_FILE_PATH`	full path to the config file on your local machine.
`MASTER_NODE_NAME`	name of the master node in the Dataproc cluster.

Invoke the autoscaler application using:
```
java -jar <path/to/jarFile> <path/to/configfile> [optional:trino-worker-port:-8060]
```
It is recommended that the autoscaler be installed as a systemd daemon.

Install autoscaler daemon¶

To install autoscaler as a systemd daemon follow the steps:

Create the variable TRINO_AUTO_SCALER_SERVICE_FOLDER if it does not already exist.
Create the variable and directory TRINO_AUTOSCALE_FOLDER if it does not already exist.
Copy the autoscaler jar to TRINO_AUTOSCALE_FOLDER.
Create a log file called trino_autoscaler.log in the directory TRINO_AUTOSCALE_FOLDER.
Create a systemd service file called trino_autoscaler.service in the directory TRINO_AUTO_SCALER_SERVICE_FOLDER.
The service file contains the following configuration:
- The description of the service: Trino Autoscaler Service
- The dependencies of the service: trino.service
- The command to start the service: java -jar ${TRINO_AUTOSCALE_FOLDER}/trino_autoscaler.jar ${TRINO_AUTOSCALE_FOLDER/config.textproto
- The command to stop the service: /bin/kill -15 \${MAINPID} The restart policy for the service: always. The standard output and standard error logs for the service: /var/log/trino_autoscaler.log. Changes the permissions of the service file trino_autoscaler.service to allow read, write, and execute permissions for all users.
- The code if [[ "{ROLE}" == 'Master' ]]; then checks the value of the environment variable ROLE. If the value of the variable is Master, then the setup_trino_autoscaler() function is executed. This ensures that the Trino autoscaler is only set up on the master node.
- The code systemctl daemon-reload reloads the systemd daemon configuration. This ensures that the systemd daemon is aware of the new service file trino_autoscaler.service.
- When you are using a port other than 8060 for Trino, add the port number with a space after config.textproto on the ExecStart command.
```
TRINO_AUTO_SCALER_SERVICE_FOLDER="/usr/lib/systemd/system/"
TRINO_AUTO_SCALER_SERVICE="/usr/lib/systemd/system/trino_autoscaler.service"
TRINO_AUTOSCALE_FOLDER="/opt/trino_autoscaler"
function setup_trino_autoscaler {

cat <<EOF >"${TRINO_AUTO_SCALER_SERVICE}"

[Unit]

Description=Trino autoscaler Service
After=trino.service

[Service]

Type=simple
ExecStart=java -jar ${TRINO_AUTOSCALE_FOLDER}/trino_autoscaler.jar ${TRINO_AUTOSCALE_FOLDER}/config.textproto
ExecStop=/bin/kill -15 \$MAINPID
Restart=always
StandardOutput=append:/var/log/trino_autoscaler.log
StandardError=append:/var/log/trino_autoscaler.log

[Install]

WantedBy=multi-user.target
EOF
chmod a+rx ${TRINO_AUTO_SCALER_SERVICE}
}

if [[ "${ROLE}" == 'Master' ]]; then
# Run only on Master
setup_trino_autoscaler
systemctl daemon-reload
systemctl enable trino_autoscaler
systemctl start trino_autoscaler
fi
```