GCP Sales Analytics MCP Server

CLAUDE.md•24.6 KiB

# CLAUDE.md - Project Context for AI Assistant > **Purpose**: This file provides context for Claude AI sessions to understand the project without re-exploration. > **Audience**: Claude (AI Assistant) > **Last Updated**: 2026-01-06 --- ## 1. PROJECT IDENTITY **Project Name**: GCP Sales Analytics POC **Location**: `/Users/matthewiames/Desktop/gcp_mcp` **Status**: ✅ Fully implemented and documented **GCP Project**: `iames-dq` **GCP Region**: `us-central1` **Purpose**: Demonstrate an intelligent AI agent (powered by Claude) that queries both Google Cloud SQL and BigQuery, automatically selecting the right data source based on user questions about sales data. **Key Achievement**: Agent intelligently routes queries to Cloud SQL (transactional data) or BigQuery (analytics data) without user needing to specify which database to query. --- ## 2. ARCHITECTURE OVERVIEW ``` ┌─────────────────────────────────────────────────────────┐ │ USER ASKS SALES QUESTION │ └──────────────────────┬──────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ CLAUDE AGENT (Anthropic Claude 3.5 Sonnet) │ │ - Analyzes question keywords and intent │ │ - Selects appropriate data source │ │ - Formulates SQL query │ │ - Executes via MCP tools │ └──────────┬────────────────────────┬─────────────────────┘ │ │ ▼ ▼ ┌──────────────────────┐ ┌──────────────────────────┐ │ CLOUD SQL │ │ BIGQUERY │ │ (PostgreSQL) │ │ (thelook_ecommerce) │ │ │ │ │ │ Tables: │ │ Tables: │ │ • customers (50) │ │ • products │ │ • orders (50) │ │ • users │ │ • vendors (50) │ │ • events │ │ │ │ • inventory_items │ │ Use for: │ │ • order_items │ │ - Customer queries │ │ • distribution_centers │ │ - Recent orders │ │ │ │ - Vendor info │ │ Use for: │ │ - Transactions │ │ - Product analytics │ │ │ │ - User behavior │ │ │ │ - Inventory analysis │ │ │ │ - Historical trends │ └──────────────────────┘ └──────────────────────────┘ ``` ### Data Source Selection Rules The agent decides which data source to use based on: | User Query Contains | → Use This Data Source | Reason | |---------------------|------------------------|---------| | "customer", "customers" | Cloud SQL | Customer table exists here | | "vendor", "vendors" | Cloud SQL | Vendor table exists here | | "order", "recent orders" | Cloud SQL | Transactional order data | | "product", "products" | BigQuery | Product catalog in BigQuery | | "inventory" | BigQuery | Inventory data in BigQuery | | "user", "users" | BigQuery | User behavior data | | "events", "analytics" | BigQuery | Event tracking data | | "trends", "historical" | BigQuery | Large-scale analytics | --- ## 3. CRITICAL FILE MAP ### 📁 Project Root: `/Users/matthewiames/Desktop/gcp_mcp/` **Documentation Files** (Read-only, informational): - `README.md` - Complete user guide (10KB) - setup, architecture, usage - `QUICKSTART.md` - 15-minute quick start guide (4KB) - `PROJECT_SUMMARY.md` - Technical deep-dive (10KB) - `CLAUDE.md` - **THIS FILE** - Context for Claude AI sessions **Configuration Files**: - `.env.example` - Environment variable template (COPY to `.env` for deployment) - `.gitignore` - Git exclusions - `requirements.txt` - Python dependencies (anthropic, google-cloud-bigquery, psycopg2, mcp, faker) **Core Application** (Modify these for features): - `adk_agent/integrated_agent.py` - **PRIMARY AGENT** (functional, 15KB) - **MODIFY HERE**: To change system prompt or routing logic - Uses Claude 3.5 Sonnet (`claude-3-5-sonnet-20241022`) - Direct database connections (not via MCP server) - Line 42-66: System prompt (guides data source selection) - `adk_agent/agent.py` - Conceptual framework (10KB) - Reference implementation with placeholder tool execution - Shows agentic architecture pattern - **DO NOT MODIFY**: This is for reference only - `mcp_server/server.py` - MCP Server implementation (12KB) - **MODIFY HERE**: To add new MCP tools - Provides 6 tools: query_cloudsql, query_bigquery, list tables, get schemas - Standards-compliant Model Context Protocol **Infrastructure** (Modify for deployment changes): - `terraform/main.tf` - Cloud SQL resource definitions (2KB) - PostgreSQL 15, db-f1-micro instance - Public IP with authorized networks - `terraform/variables.tf` - Configurable parameters (1KB) - Region, instance name, database name, credentials - `terraform/outputs.tf` - Connection details (1KB) - Instance name, connection string, public IP - `terraform/terraform.tfvars.example` - Variable values template **Data & Schema**: - `data/schema.sql` - Database schema (2KB) - 3 tables: customers, orders, vendors - Includes indexes for performance - **MODIFY HERE**: To change database structure - `scripts/generate_seed_data.py` - Synthetic data generator (6KB) - Uses Faker library - Generates 50 records per table (configurable) **Automation Scripts**: - `scripts/deploy.sh` - One-command deployment (5KB, Bash) - Runs Terraform, creates schema, seeds data - **RUN THIS**: To deploy infrastructure - `scripts/cleanup.sh` - Resource teardown (1KB, Bash) - Destroys Cloud SQL instance - **RUN THIS**: When done to avoid charges - `scripts/test_agent.py` - Automated tests (2KB, Python) --- ## 4. KEY IMPLEMENTATION DETAILS ### System Prompt (Primary Agent Behavior Controller) **Location**: `adk_agent/integrated_agent.py` lines 42-66 ```python self.system_prompt = """You are a sales analytics assistant with access to two data sources: 1. **Cloud SQL Database** (query_cloudsql, get_cloudsql_schema, list_cloudsql_tables): - Contains recent transactional data - Tables: customers, orders, vendors - Use for: specific customer queries, recent order details, vendor information, current sales data 2. **BigQuery thelook_ecommerce Dataset** (query_bigquery, get_bigquery_schema, list_bigquery_tables): - Public e-commerce analytics dataset - Tables: products, users, events, inventory_items, order_items, orders, distribution_centers - Use for: product analytics, user behavior, inventory analysis, historical trends, large-scale analytics **Decision Guidelines:** - For questions about "customers", "vendors", or recent "orders" → use Cloud SQL - For questions about "products", "inventory", "users", "events", or broad analytics → use BigQuery - When uncertain, first list tables from both sources to understand the data - Always explain which data source you're using and why - Combine data from both sources if needed to answer comprehensively **Important:** - Use descriptive SQL queries with clear column selections - Always limit results to reasonable amounts (use LIMIT clause) - Format numeric results clearly (e.g., currency with $ symbol) - Provide context and insights, not just raw data - When showing data, format it in a readable way (tables, lists, etc.)""" ``` **To modify agent behavior**: Edit this system prompt in `adk_agent/integrated_agent.py` ### MCP Tools (6 Tools Available) **Location**: `mcp_server/server.py` lines 38-128 1. **query_cloudsql** - Execute SQL against Cloud SQL PostgreSQL - Input: `{"query": "SELECT ..."}` - Security: SELECT-only queries enforced - Returns: JSON with row_count and data array 2. **query_bigquery** - Execute SQL against BigQuery thelook_ecommerce - Input: `{"query": "SELECT ..."}` - Auto-prefixes dataset path if needed - Returns: JSON with row_count, bytes_processed, data 3. **list_cloudsql_tables** - List all Cloud SQL tables with column info - Input: none - Returns: Table names with column types 4. **list_bigquery_tables** - List all BigQuery dataset tables - Input: none - Returns: Table names with row counts 5. **get_cloudsql_schema** - Get detailed schema for specific Cloud SQL table - Input: `{"table_name": "customers"}` - Returns: Columns with data types, nullability, defaults 6. **get_bigquery_schema** - Get detailed schema for specific BigQuery table - Input: `{"table_name": "products"}` - Returns: Columns with types, mode, description ### Database Schema **Cloud SQL Tables** (defined in `data/schema.sql`): ```sql -- CUSTOMERS table (50 records) CREATE TABLE customers ( customer_id SERIAL PRIMARY KEY, first_name VARCHAR(100), last_name VARCHAR(100), email VARCHAR(255) UNIQUE, phone VARCHAR(20), address VARCHAR(255), city VARCHAR(100), state VARCHAR(50), zip_code VARCHAR(20), country VARCHAR(100), created_at TIMESTAMP, updated_at TIMESTAMP ); -- VENDORS table (50 records) CREATE TABLE vendors ( vendor_id SERIAL PRIMARY KEY, vendor_name VARCHAR(255), contact_name VARCHAR(100), email VARCHAR(255), phone VARCHAR(20), address VARCHAR(255), city VARCHAR(100), state VARCHAR(50), zip_code VARCHAR(20), country VARCHAR(100), created_at TIMESTAMP, updated_at TIMESTAMP ); -- ORDERS table (50 records) CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INTEGER REFERENCES customers(customer_id), vendor_id INTEGER REFERENCES vendors(vendor_id), order_date TIMESTAMP, ship_date TIMESTAMP, total_amount DECIMAL(10, 2), status VARCHAR(50), -- 'pending', 'processing', 'shipped', 'delivered', 'cancelled' product_name VARCHAR(255), quantity INTEGER, unit_price DECIMAL(10, 2), created_at TIMESTAMP, updated_at TIMESTAMP ); ``` **Indexes**: Created on customer_id, vendor_id, order_date, status for performance ### Why Two Agent Versions? 1. **`adk_agent/agent.py`** (Conceptual): - Demonstrates agentic architecture pattern - Uses placeholder tool execution - Shows how MCP tool schemas are defined - Useful for understanding the pattern 2. **`adk_agent/integrated_agent.py`** (Functional): - **THIS IS THE ONE TO USE** - Direct database connections (psycopg2, BigQuery client) - Actually executes queries - Full implementation with error handling - Interactive CLI for testing **When to use which**: - Run `integrated_agent.py` to actually query databases - Read `agent.py` to understand the architecture pattern --- ## 5. ENVIRONMENT & CONFIGURATION ### Required Environment Variables **File**: `.env` (created from `.env.example`) ```bash # GCP Configuration GCP_PROJECT_ID=iames-dq GCP_REGION=us-central1 CLOUDSQL_INSTANCE_NAME=sales-poc-db CLOUDSQL_DATABASE=salesdb CLOUDSQL_USER=salesuser CLOUDSQL_PASSWORD=your-password-here # Database connection (set after deployment) DB_HOST=<Cloud SQL Public IP> # BigQuery BIGQUERY_DATASET=bigquery-public-data.thelook_ecommerce # Anthropic ANTHROPIC_API_KEY=your-api-key-here ``` ### Python Dependencies **File**: `requirements.txt` ``` google-cloud-bigquery>=3.14.0 google-cloud-sql-connector>=1.5.0 pg8000>=1.30.0 sqlalchemy>=2.0.0 psycopg2-binary>=2.9.9 mcp>=1.0.0 anthropic-sdk>=0.39.0 faker>=22.0.0 python-dotenv>=1.0.0 ``` ### GCP Resources - **Cloud SQL Instance**: PostgreSQL 15, db-f1-micro tier - **Cost**: ~$7-10/month (if not cleaned up!) - **Security**: Public IP with authorized networks (POC-level, not production) - **BigQuery**: Uses public dataset (minimal cost) --- ## 6. COMMON MODIFICATION PATTERNS ### How to Add a New MCP Tool **Edit**: `mcp_server/server.py` 1. Add tool definition in `list_tools()` (line ~38): ```python Tool( name="your_tool_name", description="What this tool does and when to use it", inputSchema={ "type": "object", "properties": { "param_name": { "type": "string", "description": "Parameter description" } }, "required": ["param_name"] } ) ``` 2. Add handler in `call_tool()` (line ~90): ```python elif name == "your_tool_name": result = await self._your_tool_handler(arguments["param_name"]) return [TextContent(type="text", text=json.dumps(result))] ``` 3. Implement handler method: ```python async def _your_tool_handler(self, param_name: str) -> Dict[str, Any]: # Tool implementation return {"result": "data"} ``` ### How to Modify System Prompt (Change Routing Logic) **Edit**: `adk_agent/integrated_agent.py` lines 42-66 Change the `self.system_prompt` string to modify: - Which keywords trigger which data source - Query formatting instructions - Response formatting preferences - Additional constraints or guidelines **Example**: To add a new keyword routing rule: ```python **Decision Guidelines:** - For questions about "customers", "vendors", or recent "orders" → use Cloud SQL - For questions about "products", "inventory", "users", "events", or broad analytics → use BigQuery - For questions about "shipments" → use Cloud SQL # NEW RULE ``` ### How to Add a New Data Source 1. **Add database connection** in `integrated_agent.py` `__init__()`: ```python self.new_db_config = { 'host': os.getenv('NEW_DB_HOST'), 'database': os.getenv('NEW_DB_NAME') } ``` 2. **Add tool definition** to `self.tools` list: ```python { "name": "query_new_db", "description": "Execute SQL against new database...", "input_schema": {...} } ``` 3. **Implement query method**: ```python async def _query_new_db(self, query: str) -> str: # Connection and query logic return json.dumps(result) ``` 4. **Update system prompt** to include new data source in decision guidelines ### How to Change Database Schema **Edit**: `data/schema.sql` 1. Add new table or modify existing: ```sql CREATE TABLE new_table ( id SERIAL PRIMARY KEY, ... ); ``` 2. Update indexes if needed: ```sql CREATE INDEX idx_new_table_field ON new_table(field); ``` 3. Update `scripts/generate_seed_data.py` to generate data for new table 4. Re-run deployment: ```bash ./scripts/deploy.sh ``` --- ## 7. OPERATIONAL CONTEXT ### Deployment Workflow ```bash # 1. Configure environment cp .env.example .env # Edit .env with ANTHROPIC_API_KEY and verify GCP_PROJECT_ID # 2. Authenticate with GCP gcloud auth login gcloud auth application-default login # 3. Deploy everything (one command) ./scripts/deploy.sh # This runs: Terraform apply → Create schema → Seed data → Verify BigQuery # Takes ~5-10 minutes ``` ### Running the Agent ```bash # Option 1: Integrated agent (functional) python3 adk_agent/integrated_agent.py # Option 2: MCP server (separate terminal) python3 mcp_server/server.py ``` ### Example Queries to Test ``` What are the total sales from Cloud SQL? How many customers do we have? Show me orders with status 'delivered' What's the average order value? List the top 5 customers by total spending How many products are in BigQuery? Show me user events from BigQuery ``` ### Cleanup (Important!) ```bash # Destroy all GCP resources ./scripts/cleanup.sh # Confirms before destroying # Prevents ongoing charges (~$7-10/month) ``` ### Testing ```bash # Run automated tests python3 scripts/test_agent.py # Manual database connection test psql -h $DB_HOST -U salesuser -d salesdb ``` --- ## 8. IMPORTANT CONSTRAINTS & CONSIDERATIONS ### Security Model (POC-Level) ⚠️ **This is a proof of concept with simplified security**: - **Public IP**: Cloud SQL has public IP (uses authorized networks) - **Credentials**: Stored in `.env` file (not Secret Manager) - **SQL Injection**: Basic prevention (SELECT-only enforcement) - **No VPC**: No private networking - **No IAM Auth**: Uses password authentication **For Production**, implement: - Cloud SQL Proxy or private IP - Secret Manager for credentials - VPC networking - IAM database authentication - Query validation and sanitization - Rate limiting - Audit logging ### Data Constraints - **Cloud SQL**: Only 50 records per table (small dataset for POC) - **Synthetic Data**: Generated with Faker (not real customer data) - **Two Separate Systems**: Cloud SQL and BigQuery are not synchronized - **No Transactions**: Each query is independent - **SELECT-Only**: No INSERT/UPDATE/DELETE allowed (safety) ### Cost Considerations - **Cloud SQL**: ~$7-10/month (db-f1-micro instance) - **BigQuery**: Pay per query (~$5/TB scanned, public dataset may be free) - **Anthropic API**: Pay per token used - **Total**: ~$10-15/month if left running **Always run `./scripts/cleanup.sh` when done!** ### Known Limitations 1. **Agent doesn't combine data across sources** (could be added) 2. **No caching** (queries re-execute every time) 3. **No query optimization** (agent may generate inefficient SQL) 4. **No streaming responses** (all results returned at once) 5. **No pagination** (relies on LIMIT clauses) --- ## 9. QUICK REFERENCE ### MCP Tools Cheat Sheet | Tool Name | Purpose | When to Use | Input | Output | |-----------|---------|-------------|-------|--------| | `query_cloudsql` | Execute SQL | Customer/vendor/order queries | SQL string | JSON with data | | `query_bigquery` | Execute SQL | Product/analytics queries | SQL string | JSON with data | | `list_cloudsql_tables` | Show schema | Discover available tables | None | Table list | | `list_bigquery_tables` | Show schema | Discover BQ tables | None | Table list | | `get_cloudsql_schema` | Table details | Understand table structure | Table name | Column details | | `get_bigquery_schema` | Table details | Understand BQ table | Table name | Column details | ### File Modification Guide | To Do This | Edit This File | Look For | |------------|----------------|----------| | Change agent routing logic | `adk_agent/integrated_agent.py` | Line 42: `self.system_prompt` | | Add new MCP tool | `mcp_server/server.py` | Line 38: `list_tools()` | | Change database schema | `data/schema.sql` | Table definitions | | Modify deployment config | `terraform/variables.tf` | Variable defaults | | Add new data source | `adk_agent/integrated_agent.py` | `__init__()` and tools list | | Change instance size | `terraform/main.tf` | Line 27: `tier` setting | | Update seed data count | `scripts/generate_seed_data.py` | Lines 19-21: NUM_* constants | ### Command Cheat Sheet ```bash # Deployment ./scripts/deploy.sh # Deploy everything ./scripts/cleanup.sh # Destroy resources # Running python3 adk_agent/integrated_agent.py # Run agent (interactive) python3 mcp_server/server.py # Run MCP server python3 scripts/test_agent.py # Run tests # Database psql -h $DB_HOST -U salesuser -d salesdb # Connect to Cloud SQL psql -h $DB_HOST -U salesuser -d salesdb -f data/schema.sql # Apply schema # Terraform cd terraform && terraform plan # Preview changes cd terraform && terraform apply # Apply changes cd terraform && terraform destroy # Destroy resources # GCP gcloud sql instances list # List Cloud SQL instances gcloud sql instances describe INSTANCE_NAME # Get instance details bq ls bigquery-public-data:thelook_ecommerce # List BigQuery tables ``` ### Data Source Selection Quick Reference | User Query Keyword | → Data Source | Reason | |--------------------|---------------|---------| | customer, buyer | Cloud SQL | `customers` table | | vendor, supplier | Cloud SQL | `vendors` table | | order (recent/specific) | Cloud SQL | `orders` table (transactional) | | sales, revenue | Cloud SQL | Sum `orders.total_amount` | | product, item | BigQuery | `products` table | | inventory, stock | BigQuery | `inventory_items` table | | user, shopper | BigQuery | `users` table | | event, behavior | BigQuery | `events` table | | analytics, trends | BigQuery | Large-scale data | --- ## 10. AGENTIC LOOP IMPLEMENTATION ### How the Agent Works (Step-by-Step) **Location**: `adk_agent/integrated_agent.py` `process_message()` method (line ~193) 1. **User Input**: User asks a question 2. **API Call**: Send to Claude with system prompt + tools 3. **Claude Response**: Returns text and/or tool calls 4. **Tool Execution**: If tool calls present, execute them 5. **Result Formatting**: Format tool results as JSON 6. **Follow-up**: Send tool results back to Claude 7. **Repeat**: Continue loop until no more tool calls 8. **Final Response**: Return accumulated text to user ```python # Simplified agentic loop while iteration < max_iterations: response = client.messages.create( model="claude-3-5-sonnet-20241022", system=system_prompt, tools=self.tools, messages=conversation_history ) # Extract text and tool calls if has_tool_calls: # Execute tools results = execute_tools(tool_calls) # Add results to conversation conversation_history.append(results) # Continue loop else: # No more tools, return response break ``` ### Conversation History Management The agent maintains conversation history to: - Enable multi-turn conversations - Allow follow-up questions - Provide context across queries **Reset with**: `agent.reset_conversation()` (clears history) --- ## 11. TROUBLESHOOTING CONTEXT ### Common Issues **"Connection refused" to Cloud SQL**: - Check `DB_HOST` in `.env` (should be public IP from Terraform output) - Verify instance is running: `gcloud sql instances list` - Check authorized networks include your IP **"Anthropic API key not found"**: - Ensure `ANTHROPIC_API_KEY` is set in `.env` - Load environment: `source .env` or restart terminal **"Table doesn't exist"**: - Run `psql -h $DB_HOST -U salesuser -d salesdb -f data/schema.sql` - Or re-run `./scripts/deploy.sh` **"BigQuery permission denied"**: - Ensure BigQuery API is enabled: `gcloud services enable bigquery.googleapis.com` - Check authentication: `gcloud auth application-default login` **Agent always chooses wrong data source**: - Check system prompt in `integrated_agent.py` line 42-66 - Verify tool descriptions are clear - Add more specific keywords to decision guidelines --- ## 12. EXTENSION IDEAS (For Future Development) ### Short-term Enhancements 1. **Caching**: Add Redis for query result caching 2. **Visualization**: Generate charts/graphs from query results 3. **Data Sync**: Sync Cloud SQL data to BigQuery for unified queries 4. **Query History**: Store and retrieve past queries 5. **Explain Plans**: Show SQL execution plans ### Medium-term Features 1. **Web Interface**: Build React frontend for agent 2. **Scheduled Reports**: Generate automated reports 3. **Alerts**: Send notifications for anomalies 4. **Multi-tenant**: Support multiple users/databases 5. **Advanced Analytics**: Add ML predictions ### Long-term Vision 1. **Natural Language to SQL**: Improve query generation 2. **Data Discovery**: Auto-suggest interesting queries 3. **Performance Optimization**: Query optimization suggestions 4. **Data Governance**: Add access controls and audit logs 5. **Production Hardening**: Full security implementation --- ## SUMMARY FOR CLAUDE **When you see this project**: 1. ✅ You're working with a fully implemented sales analytics POC 2. ✅ Primary file to run: `adk_agent/integrated_agent.py` 3. ✅ To change behavior: Edit system prompt at line 42 of integrated_agent.py 4. ✅ To add features: Add MCP tools in `mcp_server/server.py` 5. ✅ Data sources: Cloud SQL (transactional) + BigQuery (analytics) 6. ✅ Deploy: `./scripts/deploy.sh`, Cleanup: `./scripts/cleanup.sh` **Most important files**: - `adk_agent/integrated_agent.py` - The agent (modify system prompt here) - `mcp_server/server.py` - MCP tools (add new tools here) - `data/schema.sql` - Database schema - `.env` - Configuration (API keys, DB host) **This POC demonstrates**: Intelligent multi-source data querying with Claude-powered decision making using the Model Context Protocol standard. --- **End of CLAUDE.md**

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/miames/gcp_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CLAUDE.md•24.6 KiB