MCP Quoting System

PHASE-1-PLAN.md•9.92 kB

# Phase 1: Foundation Enhancement - Implementation Plan **Version:** 1.1.0 **Started:** November 13, 2024 **Target Completion:** 2-3 weeks **Status:** 🟢 IN PROGRESS --- ## 🎯 Phase 1 Goals 1. **PostgreSQL Migration** - Scalable database backend 2. **Vector Search** - Semantic similarity matching with embeddings 3. **PDF Upload** - Technical drawing analysis capability 4. **Enhanced Audit Logging** - User tracking and compliance 5. **Performance Optimization** - Handle 10,000+ quotes efficiently --- ## 📋 Implementation Checklist ### Week 1: Database Foundation #### Day 1-2: PostgreSQL Setup & Schema Design - [ ] Install PostgreSQL dependencies (pg, prisma) - [ ] Design database schema (quotes, evaluations, users, audit_logs) - [ ] Create Prisma schema file - [ ] Set up migrations - [ ] Create database connection pool - [ ] Add database configuration to .env #### Day 3-4: Data Access Layer - [ ] Create database adapter interface - [ ] Implement PostgreSQL storage service - [ ] Add connection retry logic - [ ] Implement transaction support - [ ] Create data migration scripts (JSON → PostgreSQL) - [ ] Add backward compatibility layer #### Day 5: Testing & Migration - [ ] Write unit tests for database layer - [ ] Test data migration with sample quotes - [ ] Verify query performance - [ ] Create rollback procedures - [ ] Update documentation ### Week 2: Vector Search Implementation #### Day 6-7: Embedding Infrastructure - [ ] Add OpenAI API integration - [ ] Create embedding service - [ ] Add pgvector extension to PostgreSQL - [ ] Design vector storage schema - [ ] Implement embedding generation for historical quotes - [ ] Create batch embedding processor #### Day 8-9: Semantic Search - [ ] Implement cosine similarity search - [ ] Create hybrid search (vector + metadata filters) - [ ] Add search result ranking logic - [ ] Implement search caching - [ ] Optimize query performance - [ ] Create search API endpoints #### Day 10: Integration & Testing - [ ] Integrate vector search with matcher.ts - [ ] Create fallback to rule-based matching - [ ] Add A/B testing framework (rule-based vs vector) - [ ] Performance benchmarking - [ ] Update test cases ### Week 3: PDF Upload & Enhancements #### Day 11-12: PDF Processing - [ ] Add PDF parsing dependencies (pdf-parse, pdfjs-dist) - [ ] Create PDF upload endpoint - [ ] Implement text extraction from PDFs - [ ] Add basic drawing metadata extraction - [ ] Create file storage service (local or S3) - [ ] Add virus scanning (optional) #### Day 13-14: Enhanced Audit Logging - [ ] Design audit log schema - [ ] Create audit service - [ ] Add user context tracking - [ ] Implement activity logging middleware - [ ] Create audit query API - [ ] Add compliance reporting #### Day 15: Polish & Documentation - [ ] Update all documentation - [ ] Create migration guide (v1.0 → v1.1) - [ ] Write database backup procedures - [ ] Create performance tuning guide - [ ] Update CHANGELOG.md - [ ] Prepare release notes --- ## 🗄️ Database Schema (PostgreSQL) ```sql -- Quotes table (replaces quotes.json) CREATE TABLE quotes ( id VARCHAR(50) PRIMARY KEY, quote_date TIMESTAMPTZ NOT NULL, customer_name VARCHAR(255) NOT NULL, contact_email VARCHAR(255), -- Normalized fields for matching material VARCHAR(100), processes TEXT[], -- Array of process names qty_min INTEGER, qty_max INTEGER, tolerances VARCHAR(100), finish VARCHAR(100), -- Costs and timing cost_per_unit DECIMAL(10,2), total_cost DECIMAL(10,2), lead_days INTEGER, actual_lead_days INTEGER, -- Metadata approved BOOLEAN DEFAULT false, won BOOLEAN, notes TEXT, raw_rfp TEXT, -- Vector embedding for semantic search embedding vector(1536), -- OpenAI ada-002 dimensions -- Audit fields created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), created_by VARCHAR(100), -- Indexes INDEX idx_material (material), INDEX idx_quote_date (quote_date DESC), INDEX idx_customer (customer_name) ); -- Vector similarity index CREATE INDEX ON quotes USING ivfflat (embedding vector_cosine_ops); -- Evaluations table (replaces evaluations.json) CREATE TABLE evaluations ( id VARCHAR(64) PRIMARY KEY, idempotency_key VARCHAR(64) UNIQUE NOT NULL, -- Input RFP raw_text TEXT NOT NULL, qty INTEGER, contact_email VARCHAR(255), customer_name VARCHAR(255), -- Parsed data parsed_rfp JSONB, -- Results matches JSONB, -- Array of similar quotes with scores estimate JSONB, -- Cost and lead time estimate quote_doc JSONB, -- Generated quote document confidence VARCHAR(10), -- high/medium/low -- Status status VARCHAR(20) DEFAULT 'draft', -- draft/approved/sent approved_by VARCHAR(100), approved_at TIMESTAMPTZ, sent_at TIMESTAMPTZ, -- Audit created_at TIMESTAMPTZ DEFAULT NOW(), created_by VARCHAR(100), INDEX idx_created_at (created_at DESC), INDEX idx_status (status), INDEX idx_customer (customer_name) ); -- Users table (for Phase 2 auth, but define now) CREATE TABLE users ( id SERIAL PRIMARY KEY, username VARCHAR(100) UNIQUE NOT NULL, email VARCHAR(255) UNIQUE NOT NULL, role VARCHAR(20) NOT NULL, -- engineer/manager/admin active BOOLEAN DEFAULT true, created_at TIMESTAMPTZ DEFAULT NOW(), last_login TIMESTAMPTZ ); -- Audit logs table CREATE TABLE audit_logs ( id SERIAL PRIMARY KEY, timestamp TIMESTAMPTZ DEFAULT NOW(), user_id INTEGER REFERENCES users(id), action VARCHAR(100) NOT NULL, -- ingest_rfp/generate_quote/approve/send resource_type VARCHAR(50), -- evaluation/quote resource_id VARCHAR(100), details JSONB, ip_address INET, user_agent TEXT ); -- PDF uploads table CREATE TABLE pdf_uploads ( id SERIAL PRIMARY KEY, evaluation_id VARCHAR(64) REFERENCES evaluations(id), filename VARCHAR(255) NOT NULL, file_path VARCHAR(500) NOT NULL, file_size INTEGER, mime_type VARCHAR(100), extracted_text TEXT, metadata JSONB, uploaded_at TIMESTAMPTZ DEFAULT NOW(), uploaded_by VARCHAR(100) ); ``` --- ## 📦 New Dependencies ```json { "dependencies": { "@prisma/client": "^5.6.0", "pg": "^8.11.3", "pgvector": "^0.1.5", "openai": "^4.20.1", "pdf-parse": "^1.1.1", "multer": "^1.4.5-lts.1" }, "devDependencies": { "prisma": "^5.6.0", "@types/pg": "^8.10.9", "@types/multer": "^1.4.11" } } ``` --- ## 🔧 Configuration Updates ### .env additions: ```bash # PostgreSQL DATABASE_URL="postgresql://user:password@localhost:5432/quoting_db" DATABASE_POOL_MIN=2 DATABASE_POOL_MAX=10 # OpenAI for embeddings OPENAI_API_KEY="sk-..." # File upload UPLOAD_DIR="./uploads" MAX_FILE_SIZE_MB=10 # Feature flags ENABLE_VECTOR_SEARCH=true ENABLE_RULE_BASED_FALLBACK=true ``` --- ## 🚀 Migration Strategy ### Step 1: Parallel Operation (Week 1) - Both JSON and PostgreSQL work simultaneously - Writes go to both systems - Reads prefer PostgreSQL, fallback to JSON ### Step 2: Data Migration (Week 1 End) - Run migration script: `npm run migrate:json-to-db` - Verify data integrity - Keep JSON files as backup ### Step 3: PostgreSQL Only (Week 2) - Remove JSON storage code - Delete data/*.json files (after backup) - Update all documentation --- ## 📊 Performance Targets - **Query Response**: < 100ms for similarity search - **Embedding Generation**: < 500ms per RFP - **PDF Processing**: < 2s for typical drawing - **Database Connections**: Pool of 10 connections - **Concurrent Requests**: Handle 50 simultaneous evaluations - **Data Volume**: Support 10,000+ historical quotes --- ## 🧪 Testing Strategy ### Unit Tests - Database connection and pooling - CRUD operations for each table - Vector similarity calculations - PDF text extraction - Embedding generation ### Integration Tests - End-to-end RFP evaluation with database - Vector search vs rule-based comparison - Data migration validation - Concurrent request handling ### Performance Tests - Load test with 1000 concurrent requests - Query performance with 10,000+ quotes - Vector search benchmark - Memory usage profiling --- ## 📚 Documentation Updates - [ ] Update README.md with database setup instructions - [ ] Create DATABASE-SETUP.md guide - [ ] Document vector search configuration - [ ] Create PDF upload API documentation - [ ] Update QUICKSTART.md with new features - [ ] Add troubleshooting section for PostgreSQL --- ## 🔄 Rollback Plan If Phase 1 encounters critical issues: 1. **Database Issues**: Revert to JSON storage (code preserved) 2. **Vector Search Issues**: Disable via feature flag, use rule-based 3. **PDF Upload Issues**: Disable endpoint, manual text entry 4. **Data Loss Prevention**: Daily automated backups 5. **Version Control**: Tagged releases for quick rollback --- ## 🎓 Learning Resources - [Prisma Documentation](https://www.prisma.io/docs) - [pgvector Guide](https://github.com/pgvector/pgvector) - [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings) - [PostgreSQL Performance Tuning](https://wiki.postgresql.org/wiki/Performance_Optimization) --- ## 📈 Success Metrics - ✅ All existing functionality preserved - ✅ 10x improvement in query performance - ✅ Semantic search accuracy > 90% vs rule-based - ✅ Zero data loss during migration - ✅ All tests passing - ✅ Documentation complete and accurate --- ## 🚦 Go/No-Go Criteria Before moving to Phase 2: - [ ] All Phase 1 features working in production - [ ] Performance targets met - [ ] Zero critical bugs - [ ] Documentation complete - [ ] User acceptance testing passed - [ ] Database backups automated --- **Next Steps**: Let's start with Day 1-2: PostgreSQL Setup & Schema Design!

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/r-long/mcp-quoting-system'

If you have feedback or need assistance with the MCP directory API, please join our Discord server