# GCP Sales Analytics POC
A proof of concept demonstrating an intelligent AI agent that can query both Google Cloud SQL (PostgreSQL) and BigQuery public datasets, automatically choosing the right data source based on the question.
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ ADK Agent (Claude) │
│ Intelligently routes queries to appropriate data source │
└────────────────┬────────────────────────────────────────────┘
│
├─────────────────┬──────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ MCP Server │ │ MCP Server │ │ Tools │
│ (Cloud SQL) │ │ (BigQuery) │ │ (Schema) │
└──────┬───────┘ └──────┬───────┘ └──────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Cloud SQL │ │ BigQuery │
│ PostgreSQL │ │ thelook_ │
│ │ │ ecommerce │
│ • customers │ │ • products │
│ • orders │ │ • users │
│ • vendors │ │ • events │
└──────────────┘ │ • inventory │
│ • orders │
└──────────────┘
```
## Features
- **Dual Data Source Access**: Query both Cloud SQL and BigQuery seamlessly
- **Intelligent Routing**: Agent automatically determines which data source to use
- **MCP Server**: Standards-compliant Model Context Protocol server
- **Synthetic Data**: Pre-populated with 50 customers, 50 vendors, and 50 orders
- **Infrastructure as Code**: Terraform for Cloud SQL deployment
- **Production-Ready**: Error handling, logging, and security best practices
## Data Sources
### Cloud SQL (PostgreSQL)
Contains transactional data with three tables:
- **customers**: Customer information (name, email, address, etc.)
- **orders**: Order details (amounts, dates, status, products)
- **vendors**: Vendor information
Use for queries about:
- Specific customer information
- Recent order details
- Vendor data
- Current sales transactions
### BigQuery (thelook_ecommerce)
Public e-commerce analytics dataset with comprehensive data:
- products, users, events
- inventory_items, order_items
- distribution_centers
Use for queries about:
- Product analytics
- User behavior patterns
- Inventory analysis
- Historical trends
- Large-scale analytics
## Prerequisites
- Google Cloud Platform account with billing enabled
- GCP Project with appropriate permissions
- Tools installed:
- `gcloud` CLI
- `terraform` (>= 1.0)
- `python3` (>= 3.9)
- `psql` (PostgreSQL client)
- Anthropic API key for Claude
## Setup
### 1. Clone and Configure
```bash
cd /Users/matthewiames/Desktop/gcp_mcp
# Copy environment template
cp .env.example .env
# Edit .env with your values
# Required: ANTHROPIC_API_KEY, GCP_PROJECT_ID
nano .env
```
### 2. Install Python Dependencies
```bash
# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
### 3. Authenticate with GCP
```bash
# Login to GCP
gcloud auth login
# Set application default credentials
gcloud auth application-default login
# Set your project
gcloud config set project YOUR_PROJECT_ID
```
### 4. Deploy Infrastructure
```bash
# Run the automated deployment script
./scripts/deploy.sh
```
The deployment script will:
1. Deploy Cloud SQL instance with Terraform
2. Create database schema (customers, orders, vendors)
3. Seed database with synthetic data
4. Verify BigQuery access
This process takes approximately 5-10 minutes.
## Usage
### Running the MCP Server
```bash
# In terminal 1
python3 mcp_server/server.py
```
The MCP server provides tools for:
- `query_cloudsql` - Execute SQL against Cloud SQL
- `query_bigquery` - Execute SQL against BigQuery
- `list_cloudsql_tables` - List Cloud SQL tables
- `list_bigquery_tables` - List BigQuery tables
- `get_cloudsql_schema` - Get table schema from Cloud SQL
- `get_bigquery_schema` - Get table schema from BigQuery
### Running the Agent (Interactive Mode)
```bash
# In terminal 2
python3 adk_agent/agent.py
```
Example interactions:
```
📊 You: What are the total sales from our database?
🤖 Agent: I'll query the Cloud SQL database to calculate total sales...
[Uses query_cloudsql tool]
The total sales amount from all orders is $24,567.89 across 50 orders.
📊 You: Show me the top 5 products from BigQuery
🤖 Agent: I'll query the BigQuery thelook_ecommerce dataset...
[Uses query_bigquery tool]
Here are the top 5 products by sales:
1. Product A - $15,234
2. Product B - $12,456
...
```
### Running Tests
```bash
# Run automated tests
python3 scripts/test_agent.py
```
## Project Structure
```
gcp_mcp/
├── README.md # This file
├── requirements.txt # Python dependencies
├── .env.example # Environment template
├── .gitignore # Git ignore rules
│
├── terraform/ # Infrastructure as Code
│ ├── main.tf # Cloud SQL resources
│ ├── variables.tf # Input variables
│ ├── outputs.tf # Output values
│ └── terraform.tfvars.example
│
├── data/ # Database schema
│ └── schema.sql # Table definitions
│
├── scripts/ # Deployment and utilities
│ ├── deploy.sh # Automated deployment
│ ├── cleanup.sh # Resource cleanup
│ ├── generate_seed_data.py # Synthetic data generation
│ └── test_agent.py # Agent tests
│
├── mcp_server/ # MCP Server implementation
│ └── server.py # Database MCP server
│
└── adk_agent/ # ADK Agent implementation
└── agent.py # Sales analytics agent
```
## How It Works
### Agent Decision Making
The agent uses Claude's function calling capabilities with a specialized system prompt that guides data source selection:
1. **Question Analysis**: Agent analyzes the user's question
2. **Schema Discovery**: May first list tables to understand available data
3. **Source Selection**: Chooses Cloud SQL or BigQuery based on:
- Keywords (customers, vendors → Cloud SQL)
- Query type (analytics, trends → BigQuery)
- Data recency requirements
4. **Query Execution**: Formulates and executes appropriate SQL
5. **Result Presentation**: Formats and explains results
### Example Decision Flow
```
User: "What are my recent orders?"
↓
Agent thinks: "recent orders" + "my" suggests transactional data
↓
Decision: Use Cloud SQL
↓
Tool: query_cloudsql
↓
SQL: SELECT * FROM orders ORDER BY order_date DESC LIMIT 10
```
## Configuration
### Environment Variables (.env)
```bash
# GCP Configuration
GCP_PROJECT_ID=your-project-id
GCP_REGION=us-central1
CLOUDSQL_INSTANCE_NAME=sales-poc-db
CLOUDSQL_DATABASE=salesdb
CLOUDSQL_USER=salesuser
CLOUDSQL_PASSWORD=your-secure-password
# BigQuery
BIGQUERY_DATASET=bigquery-public-data.thelook_ecommerce
# Anthropic
ANTHROPIC_API_KEY=your-api-key
# Database connection (set after deployment)
DB_HOST=your-cloudsql-ip
```
### Terraform Variables
See [terraform/terraform.tfvars.example](terraform/terraform.tfvars.example)
## Security Considerations
This is a **proof of concept** with simplified security:
- Cloud SQL has public IP (uses authorized networks)
- Database credentials in environment variables
- No VPC or private networking
- SQL injection prevention (SELECT-only queries)
For production:
- Use Cloud SQL Proxy or private IP
- Store credentials in Secret Manager
- Implement VPC and private networking
- Add query validation and sanitization
- Enable Cloud SQL backups
- Use IAM authentication
- Implement rate limiting
## Cost Estimation
Approximate costs for running this POC:
- **Cloud SQL** (db-f1-micro): ~$7-10/month
- **BigQuery**: Pay per query (~$5/TB scanned, public datasets may be free)
- **Anthropic API**: Pay per token
**Important**: Run `./scripts/cleanup.sh` when done to avoid ongoing charges.
## Cleanup
To destroy all resources:
```bash
./scripts/cleanup.sh
```
This will:
- Destroy the Cloud SQL instance
- Remove all data
- Clean up Terraform state
## Troubleshooting
### Connection Issues
```bash
# Test Cloud SQL connection
psql -h $DB_HOST -U $CLOUDSQL_USER -d $CLOUDSQL_DATABASE
# Check instance status
gcloud sql instances describe sales-poc-db-XXXX
```
### API Access Issues
```bash
# Enable required APIs
gcloud services enable sqladmin.googleapis.com
gcloud services enable bigquery.googleapis.com
# Check authentication
gcloud auth list
```
### Terraform Issues
```bash
cd terraform
# Re-initialize
terraform init
# Check state
terraform show
```
## Extending the POC
Ideas for enhancement:
1. **Add More Data Sources**
- Cloud Spanner
- Firestore
- External APIs
2. **Enhanced Agent Capabilities**
- Data visualization
- Report generation
- Predictive analytics
3. **Production Features**
- Caching layer
- Query optimization
- Audit logging
- Monitoring and alerts
4. **Advanced MCP Features**
- Streaming responses
- Batch operations
- Transaction support
## Resources
- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
- [Anthropic Claude API](https://docs.anthropic.com/)
- [Google Cloud SQL](https://cloud.google.com/sql)
- [BigQuery Public Datasets](https://cloud.google.com/bigquery/public-data)
- [Terraform GCP Provider](https://registry.terraform.io/providers/hashicorp/google/latest/docs)
## License
This is a proof of concept for demonstration purposes.
## Support
For issues or questions:
1. Check the troubleshooting section
2. Review logs in the terminal
3. Check GCP Console for resource status