PHASE-1-PLAN.mdโข9.92 kB
# Phase 1: Foundation Enhancement - Implementation Plan
**Version:** 1.1.0
**Started:** November 13, 2024
**Target Completion:** 2-3 weeks
**Status:** ๐ข IN PROGRESS
---
## ๐ฏ Phase 1 Goals
1. **PostgreSQL Migration** - Scalable database backend
2. **Vector Search** - Semantic similarity matching with embeddings
3. **PDF Upload** - Technical drawing analysis capability
4. **Enhanced Audit Logging** - User tracking and compliance
5. **Performance Optimization** - Handle 10,000+ quotes efficiently
---
## ๐ Implementation Checklist
### Week 1: Database Foundation
#### Day 1-2: PostgreSQL Setup & Schema Design
- [ ] Install PostgreSQL dependencies (pg, prisma)
- [ ] Design database schema (quotes, evaluations, users, audit_logs)
- [ ] Create Prisma schema file
- [ ] Set up migrations
- [ ] Create database connection pool
- [ ] Add database configuration to .env
#### Day 3-4: Data Access Layer
- [ ] Create database adapter interface
- [ ] Implement PostgreSQL storage service
- [ ] Add connection retry logic
- [ ] Implement transaction support
- [ ] Create data migration scripts (JSON โ PostgreSQL)
- [ ] Add backward compatibility layer
#### Day 5: Testing & Migration
- [ ] Write unit tests for database layer
- [ ] Test data migration with sample quotes
- [ ] Verify query performance
- [ ] Create rollback procedures
- [ ] Update documentation
### Week 2: Vector Search Implementation
#### Day 6-7: Embedding Infrastructure
- [ ] Add OpenAI API integration
- [ ] Create embedding service
- [ ] Add pgvector extension to PostgreSQL
- [ ] Design vector storage schema
- [ ] Implement embedding generation for historical quotes
- [ ] Create batch embedding processor
#### Day 8-9: Semantic Search
- [ ] Implement cosine similarity search
- [ ] Create hybrid search (vector + metadata filters)
- [ ] Add search result ranking logic
- [ ] Implement search caching
- [ ] Optimize query performance
- [ ] Create search API endpoints
#### Day 10: Integration & Testing
- [ ] Integrate vector search with matcher.ts
- [ ] Create fallback to rule-based matching
- [ ] Add A/B testing framework (rule-based vs vector)
- [ ] Performance benchmarking
- [ ] Update test cases
### Week 3: PDF Upload & Enhancements
#### Day 11-12: PDF Processing
- [ ] Add PDF parsing dependencies (pdf-parse, pdfjs-dist)
- [ ] Create PDF upload endpoint
- [ ] Implement text extraction from PDFs
- [ ] Add basic drawing metadata extraction
- [ ] Create file storage service (local or S3)
- [ ] Add virus scanning (optional)
#### Day 13-14: Enhanced Audit Logging
- [ ] Design audit log schema
- [ ] Create audit service
- [ ] Add user context tracking
- [ ] Implement activity logging middleware
- [ ] Create audit query API
- [ ] Add compliance reporting
#### Day 15: Polish & Documentation
- [ ] Update all documentation
- [ ] Create migration guide (v1.0 โ v1.1)
- [ ] Write database backup procedures
- [ ] Create performance tuning guide
- [ ] Update CHANGELOG.md
- [ ] Prepare release notes
---
## ๐๏ธ Database Schema (PostgreSQL)
```sql
-- Quotes table (replaces quotes.json)
CREATE TABLE quotes (
id VARCHAR(50) PRIMARY KEY,
quote_date TIMESTAMPTZ NOT NULL,
customer_name VARCHAR(255) NOT NULL,
contact_email VARCHAR(255),
-- Normalized fields for matching
material VARCHAR(100),
processes TEXT[], -- Array of process names
qty_min INTEGER,
qty_max INTEGER,
tolerances VARCHAR(100),
finish VARCHAR(100),
-- Costs and timing
cost_per_unit DECIMAL(10,2),
total_cost DECIMAL(10,2),
lead_days INTEGER,
actual_lead_days INTEGER,
-- Metadata
approved BOOLEAN DEFAULT false,
won BOOLEAN,
notes TEXT,
raw_rfp TEXT,
-- Vector embedding for semantic search
embedding vector(1536), -- OpenAI ada-002 dimensions
-- Audit fields
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
created_by VARCHAR(100),
-- Indexes
INDEX idx_material (material),
INDEX idx_quote_date (quote_date DESC),
INDEX idx_customer (customer_name)
);
-- Vector similarity index
CREATE INDEX ON quotes USING ivfflat (embedding vector_cosine_ops);
-- Evaluations table (replaces evaluations.json)
CREATE TABLE evaluations (
id VARCHAR(64) PRIMARY KEY,
idempotency_key VARCHAR(64) UNIQUE NOT NULL,
-- Input RFP
raw_text TEXT NOT NULL,
qty INTEGER,
contact_email VARCHAR(255),
customer_name VARCHAR(255),
-- Parsed data
parsed_rfp JSONB,
-- Results
matches JSONB, -- Array of similar quotes with scores
estimate JSONB, -- Cost and lead time estimate
quote_doc JSONB, -- Generated quote document
confidence VARCHAR(10), -- high/medium/low
-- Status
status VARCHAR(20) DEFAULT 'draft', -- draft/approved/sent
approved_by VARCHAR(100),
approved_at TIMESTAMPTZ,
sent_at TIMESTAMPTZ,
-- Audit
created_at TIMESTAMPTZ DEFAULT NOW(),
created_by VARCHAR(100),
INDEX idx_created_at (created_at DESC),
INDEX idx_status (status),
INDEX idx_customer (customer_name)
);
-- Users table (for Phase 2 auth, but define now)
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(100) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
role VARCHAR(20) NOT NULL, -- engineer/manager/admin
active BOOLEAN DEFAULT true,
created_at TIMESTAMPTZ DEFAULT NOW(),
last_login TIMESTAMPTZ
);
-- Audit logs table
CREATE TABLE audit_logs (
id SERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ DEFAULT NOW(),
user_id INTEGER REFERENCES users(id),
action VARCHAR(100) NOT NULL, -- ingest_rfp/generate_quote/approve/send
resource_type VARCHAR(50), -- evaluation/quote
resource_id VARCHAR(100),
details JSONB,
ip_address INET,
user_agent TEXT
);
-- PDF uploads table
CREATE TABLE pdf_uploads (
id SERIAL PRIMARY KEY,
evaluation_id VARCHAR(64) REFERENCES evaluations(id),
filename VARCHAR(255) NOT NULL,
file_path VARCHAR(500) NOT NULL,
file_size INTEGER,
mime_type VARCHAR(100),
extracted_text TEXT,
metadata JSONB,
uploaded_at TIMESTAMPTZ DEFAULT NOW(),
uploaded_by VARCHAR(100)
);
```
---
## ๐ฆ New Dependencies
```json
{
"dependencies": {
"@prisma/client": "^5.6.0",
"pg": "^8.11.3",
"pgvector": "^0.1.5",
"openai": "^4.20.1",
"pdf-parse": "^1.1.1",
"multer": "^1.4.5-lts.1"
},
"devDependencies": {
"prisma": "^5.6.0",
"@types/pg": "^8.10.9",
"@types/multer": "^1.4.11"
}
}
```
---
## ๐ง Configuration Updates
### .env additions:
```bash
# PostgreSQL
DATABASE_URL="postgresql://user:password@localhost:5432/quoting_db"
DATABASE_POOL_MIN=2
DATABASE_POOL_MAX=10
# OpenAI for embeddings
OPENAI_API_KEY="sk-..."
# File upload
UPLOAD_DIR="./uploads"
MAX_FILE_SIZE_MB=10
# Feature flags
ENABLE_VECTOR_SEARCH=true
ENABLE_RULE_BASED_FALLBACK=true
```
---
## ๐ Migration Strategy
### Step 1: Parallel Operation (Week 1)
- Both JSON and PostgreSQL work simultaneously
- Writes go to both systems
- Reads prefer PostgreSQL, fallback to JSON
### Step 2: Data Migration (Week 1 End)
- Run migration script: `npm run migrate:json-to-db`
- Verify data integrity
- Keep JSON files as backup
### Step 3: PostgreSQL Only (Week 2)
- Remove JSON storage code
- Delete data/*.json files (after backup)
- Update all documentation
---
## ๐ Performance Targets
- **Query Response**: < 100ms for similarity search
- **Embedding Generation**: < 500ms per RFP
- **PDF Processing**: < 2s for typical drawing
- **Database Connections**: Pool of 10 connections
- **Concurrent Requests**: Handle 50 simultaneous evaluations
- **Data Volume**: Support 10,000+ historical quotes
---
## ๐งช Testing Strategy
### Unit Tests
- Database connection and pooling
- CRUD operations for each table
- Vector similarity calculations
- PDF text extraction
- Embedding generation
### Integration Tests
- End-to-end RFP evaluation with database
- Vector search vs rule-based comparison
- Data migration validation
- Concurrent request handling
### Performance Tests
- Load test with 1000 concurrent requests
- Query performance with 10,000+ quotes
- Vector search benchmark
- Memory usage profiling
---
## ๐ Documentation Updates
- [ ] Update README.md with database setup instructions
- [ ] Create DATABASE-SETUP.md guide
- [ ] Document vector search configuration
- [ ] Create PDF upload API documentation
- [ ] Update QUICKSTART.md with new features
- [ ] Add troubleshooting section for PostgreSQL
---
## ๐ Rollback Plan
If Phase 1 encounters critical issues:
1. **Database Issues**: Revert to JSON storage (code preserved)
2. **Vector Search Issues**: Disable via feature flag, use rule-based
3. **PDF Upload Issues**: Disable endpoint, manual text entry
4. **Data Loss Prevention**: Daily automated backups
5. **Version Control**: Tagged releases for quick rollback
---
## ๐ Learning Resources
- [Prisma Documentation](https://www.prisma.io/docs)
- [pgvector Guide](https://github.com/pgvector/pgvector)
- [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings)
- [PostgreSQL Performance Tuning](https://wiki.postgresql.org/wiki/Performance_Optimization)
---
## ๐ Success Metrics
- โ
All existing functionality preserved
- โ
10x improvement in query performance
- โ
Semantic search accuracy > 90% vs rule-based
- โ
Zero data loss during migration
- โ
All tests passing
- โ
Documentation complete and accurate
---
## ๐ฆ Go/No-Go Criteria
Before moving to Phase 2:
- [ ] All Phase 1 features working in production
- [ ] Performance targets met
- [ ] Zero critical bugs
- [ ] Documentation complete
- [ ] User acceptance testing passed
- [ ] Database backups automated
---
**Next Steps**: Let's start with Day 1-2: PostgreSQL Setup & Schema Design!