S3 to Cloud Storage Migration Demo and Inventory¶
This document describes the high-level architecture of the Python application used to generate Amazon S3 inventory reports for migration. The application has a core function for fetching inventory data and an optional function for enriching this data with contextual information from the Cloud Architect Center Migrate from Amazon S3 to Cloud Storage.
Architecture¶
The diagram below visualizes the components and the flow of the application. The
optional functionality is clearly marked with a dotted line.

Component Breakdown¶
The Python application is designed with modularity, allowing the optional AI recommendation step to be included or skipped based on user configuration.
-
Amazon S3 (Data Source)
- Role: The primary data source. The application inventories S3 buckets and their objects.
- Data Points: It gathers detailed configuration for each bucket (e.g., encryption, versioning, lifecycle rules) and for each object (e.g., size, storage class, last modified date).
-
Python Application (Core Logic)
- S3 Interaction (
utils.py): Uses theboto3library to communicate with the AWS SDK. It lists all buckets and their configurations, and for a specified bucket, lists all objects and their versions. - Web Content Fetching (
web_fetch.py): Before generating recommendations, the application fetches external guidance from a Google Cloud Architecture Center URL using therequestslibrary. This provides up-to-date context for the migration recommendations. - AI Recommendation Engine (
gemini.py): This is an optional module, skipped if the--no-gemini-recommendationsflag is used. It uses thegoogle-genailibrary to connect to a Gemini model on Vertex AI. It sends the collected S3 inventory data and the fetched web content to the model to generate a migration plan. - Orchestration (
s3_inventory.py): The main script that coordinates the entire process. It handles command-line arguments, calls the inventory functions, fetches web context, and triggers the AI recommendation generation.
- S3 Interaction (
-
Outputs
- CSV Reports:
bucket_inventory.csvandobject_inventory_{bucket_name}.csv. These files contain the raw inventory data and are always generated. - Markdown Report (
migration_recommendations.md): An optional, AI-generated report that provides migration recommendations in a markdown table.
- CSV Reports:
Implementation Details¶
| Component | Technology/Tool | Interaction/Purpose |
|---|---|---|
| S3 Inventory | Python (boto3, pandas) |
Lists buckets/objects, gets configs, creates CSVs. |
| Web Context | Python (requests) |
Fetches migration guide from Google Cloud URL. |
| AI Recommendations | Python (google-genai) |
Generates migration plan using Gemini on Vertex AI. |
| CLI Interface | Python (argparse) |
Provides a command-line flag to skip AI features. |
| Console Output | Python (rich) |
Prints AI recommendations as formatted markdown. |
| Configuration | Python (config.py) |
Manages Gemini model name, prompt, and context URL. |
Sourcing & Context¶
The guidance for this project is sourced from the Google Cloud Architecture Center document: Migrate from Amazon S3 to Cloud Storage.
This script focuses on the Assess phase of the cloud migration framework:
Key Inventory Data Points¶
The Gemini recommendations leverage the following key data points to create a plan for migrating S3 artifacts :
- Server-side encryption and IAM settings.
- Cost allocation tags and S3 Object Lock.
- Object versioning and Intelligent-Tiering.
- Aggregate statistics like object size and count, which are used to estimate time and cost.
Costs¶
This solution uses billable services from both AWS and Google Cloud. Please be aware of the following potential costs:
- AWS S3 CLI/Python SDK Calls: The scripts and tools use S3 API calls like
ListAllMyBuckets,ListBucket,CreateBucket,PutObject, etc. While these are relatively low-cost, they are billable. Review the AWS S3 Pricing page for details on Request and Data Retrieval pricing. - Google Gemini Python SDK Calls (Vertex AI): The inventory script can call the Gemini API to generate migration recommendations. This is a billable service on Vertex AI. Costs are typically based on the amount of data (tokens) sent to and received from the model. Review the Vertex AI Pricing for Generative AI to estimate costs based on your usage.
Prerequisites¶
AWS Permissions¶
The full demo setup and cleanup requires the permissions listed in the bullet points below.
If you only intend to run the s3_inventory.py script without the AI
recommendations based on inventory data, you can use the following minimum
permissions policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": "*"
}
]
}
(Optional) Create Sample Bucket and Data¶
The script found in the s3-to-cloud-storage/scripts files create AWS buckets
and objects for demo purposes.
- An AWS user account with permissions to create S3 buckets and manage IAM.
While an admin user is simplest for this demo, it is not recommended for
production environments. For a least-privilege setup, create an IAM policy
with the following permissions and attach it to your user or role:
s3:CreateBuckets3:PutBucketTaggings3:PutObjects3:ListBuckets3:ListAllMyBucketss3:GetBucketLocation
- An AWS Access Key ID and AWS Secret Access Key for your user to run the AWS CLI.
Google Cloud¶
- A Google Cloud project.
- The Google Cloud SDK (gcloud) or access to the Google Cloud Shell.
- To create and manage Storage Transfer Service jobs, your user account needs
the
Storage Transfer Adminrole (roles/storagetransfer.admin). - To use the Gemini recommendation feature, your user account needs the
Vertex AI Userrole (roles/aiplatform.user).
Local Environment¶
- Python 3.6+
- AWS CLI installed and configured.
Setup¶
Configure AWS CLI¶
First, ensure you have the AWS CLI installed and configured.
-
Verify AWS CLI version:
-
Configure AWS Credentials:
Provide your AWS Access Key ID, Secret Access Key, and default region using a role with the minimum defined roles listed above.
(Optional) Create S3 Bucket and Upload Sample Files¶
The scripts/setup-s3-demo.sh script will create an S3 bucket, generate sample
files, and upload them.
-
Navigate to the
scriptsdirectory: -
Set the environment variables:
-
Run the script:
When prompted, enter
yesto approve the creation of AWS resources.
Install Python Requirements¶
The inventory script has Python dependencies.
-
Navigate to the
pythondirectory: -
Install the required libraries using pip:
Usage: S3 Inventory and Gemini Recommendations¶
The s3_inventory.py script generates an inventory of your S3 buckets and
objects.
Set Environment Variables¶
Set the following environment variables to configure the Gemini client:
Run the script¶
To run the script, execute the following command from within the python
directory:
To run the script without generating Gemini recommendations (to avoid associated
API costs), use the --no-gemini-recommendations flag:
The script generates CSV inventory files for your buckets and objects. If the
--no-gemini-recommendations flag is not used, it will also generate a
migration_recommendations.md file with an AI-powered summary.
Configuration¶
The python/config.py file contains the configuration for the Gemini API. You
can modify this file to change the following settings:
GEMINI_RECOMMENDATION_PROMPT: The prompt template for generating Gemini recommendations.GEMINI_MODEL: The Gemini model to use for generating recommendations.GEMINI_USER_AGENT: The user agent to use when making Gemini API calls.EXTERNAL_CONTEXT_URL: The URL to fetch external context from for Gemini recommendations.
Cleanup¶
To clean up the AWS environment created by the setup-s3-demo.sh script, run
the cleanup-s3-demo.sh script from the scripts directory.
cd projects/migrate-from-aws-to-google-cloud/s3-to-cloud-storage/scripts
./cleanup-s3-demo.sh <your-bucket-name>
When prompted, enter DELETE to confirm the deletion of the S3 bucket and its
contents.