Skip to main content
Glama

FedMCP - Federal Parliamentary Information

OPENPARLIAMENT_INGESTION_STATUS.md8.13 kB
# OpenParliament PostgreSQL → Neo4j Ingestion Status ## Overview Migrating OpenParliament PostgreSQL database data into Neo4j graph database to fill identified gaps: - **3.67M Hansard statements** - **5,280 bill texts** - **35K politician info records** - **21K election candidacies** ## Completed Modules ✅ ### 1. Hansard Statements (`hansard.py`) **Status**: Module created and currently testing **Records**: 3.67M statements, 18,416 documents **File**: `/Users/matthewdufresne/FedMCP/packages/data-pipeline/fedmcp_pipeline/ingest/hansard.py` (562 lines) **Functions**: - `ingest_hansard_documents()` - Import debate documents - `ingest_hansard_statements()` - Import 3.67M statement records - `link_statements_to_mps()` - Create MADE_BY relationships - `link_statements_to_documents()` - Create PART_OF relationships - `link_bill_mentions()` - Create MENTIONS relationships for bill references - `create_hansard_indexes()` - Create full-text search indexes - `ingest_hansard_sample()` - Sample import (1,000 statements) - `ingest_hansard_full()` - Full import (3.67M statements) **Test Results** (currently running in background): ``` ✅ Created 18,416 Document nodes ✅ Created 1,000 Statement nodes ⚠️ 0 MADE_BY relationships (MPs not yet imported from PostgreSQL) 🔄 Creating PART_OF relationships (155,000+ created and counting) ``` **Fixes Applied**: 1. Fixed SQL query error: Changed `SELECT DISTINCT document_id ORDER BY time` to include `time` in SELECT list 2. Fixed API method naming: Changed all `execute_query` to `run_query` throughout module ### 2. Bill Texts (`bill_text.py`) **Status**: Module created, testing pending **Records**: 5,280 bill texts with English/French text and summaries **File**: `/Users/matthewdufresne/FedMCP/packages/data-pipeline/fedmcp_pipeline/ingest/bill_text.py` **Functions**: - `ingest_bill_texts()` - Import bill text records - `link_texts_to_bills()` - Create HAS_TEXT relationships - `create_bill_text_schema()` - Create indexes and constraints - `ingest_bill_text_sample()` - Sample import (10 texts) - `ingest_bill_text_full()` - Full import (5,280 texts) **Neo4j Schema**: - Node: `BillText` with properties: id, docid, text_en, text_fr, summary_en, created - Relationship: `(Bill)-[:HAS_TEXT]->(BillText)` - Full-text indexes on text_en, text_fr, summary_en ### 3. Politician Info (`politician_info.py`) **Status**: Module created, testing pending **Records**: 2,958 politicians with 38,641 info records **File**: `/Users/matthewdufresne/FedMCP/packages/data-pipeline/fedmcp_pipeline/ingest/politician_info.py` **Functions**: - `enrich_politician_info()` - Add biographical data to existing Politician nodes - `ingest_politician_info_sample()` - Sample enrichment (10 politicians) - `ingest_politician_info_full()` - Full enrichment (2,958 politicians) **Info Schema Types** (18 total, top 10): - alternate_name: 14,731 records - parl_affil_id: 12,569 records - parl_id: 2,958 records - favourite_word: 1,315 records - email: 921 records - phone: 920 records - constituency_offices: 916 records - parl_mp_id: 886 records - parlinfo_id: 751 records - twitter_id: 539 records ## In Progress 🔄 ### Hansard Sample Test **Background Process**: Process ID `95adb9` **Log File**: `/tmp/hansard_test_v3.log` (note: filename has space, actual path may vary) **Status**: Creating PART_OF relationships (155,000+ created, still running) **Expected Completion**: Should finish within next 10-20 minutes ## Remaining Work ⏳ ### 1. Elections Module (`elections.py`) **Not Yet Created** **Records**: 21,000 election candidacies **PostgreSQL Table**: `elections_candidacy` **Planned Functions**: - `ingest_election_candidacies()` - Import candidacy records - `link_candidacies_to_politicians()` - Create RAN_FOR relationships - `ingest_elections_sample()` - Sample import (50 candidacies) - `ingest_elections_full()` - Full import (21K candidacies) **Schema Info Needed**: ```sql -- Run to get table structure: \d elections_candidacy -- Key fields likely include: -- - politician_id (FK to core_politician) -- - election_id -- - riding/constituency -- - party -- - elected (boolean) -- - vote_count ``` ### 2. Comprehensive Test Script **Not Yet Created** **File**: `test_sample_imports.py` **Should Test**: 1. Hansard sample (1,000 statements) 2. Bill text sample (10 texts) 3. Politician info sample (10 politicians) 4. Elections sample (50 candidacies) **Validation Checks**: - Node counts match expected - Relationship counts correct - Sample data queries work - Full-text search indexes functional ### 3. Testing & Validation **Status**: Only Hansard test running so far **Remaining Tests**: - [ ] Test bill_text.py sample import - [ ] Test politician_info.py sample enrichment - [ ] Create and test elections.py - [ ] Run comprehensive test script with all modules ### 4. Documentation **Optional Final Step**: - Import guide for each module - Performance benchmarks - Common issues and solutions ## Database Connection Info **PostgreSQL** (OpenParliament): ``` Host: localhost Port: 5432 Database: openparliament User: fedmcp Password: fedmcp2024 ``` **Neo4j** (Graph Database): ``` URI: bolt://localhost:7687 User: neo4j Password: canadagpt2024 ``` **Environment File**: `/Users/matthewdufresne/FedMCP/packages/data-pipeline/.env` ## Next Steps 1. **Wait for Hansard test to complete** (~10-20 minutes) - Check progress: `tail -f /tmp/hansard_test_v3.log` - Or check background process status 2. **Test bill_text.py module**: ```bash /Users/matthewdufresne/FedMCP/venv/bin/python \ /Users/matthewdufresne/FedMCP/test_bill_text_sample.py ``` 3. **Create elections.py module**: - Check PostgreSQL schema: `\d elections_candidacy` - Follow pattern from hansard.py and bill_text.py - Include sample and full import functions 4. **Create comprehensive test script**: - Test all 4 modules in sequence - Validate counts and relationships - Report any errors or gaps 5. **Run full imports** (after sample tests pass): - Hansard: 3.67M statements (~2-4 hours estimated) - Bill texts: 5,280 texts (~5-10 minutes) - Politician info: 2,958 politicians (~2-5 minutes) - Elections: 21K candidacies (~5-10 minutes) ## Key Files ``` /Users/matthewdufresne/FedMCP/packages/data-pipeline/fedmcp_pipeline/ ├── ingest/ │ ├── __init__.py │ ├── hansard.py (✅ created, 562 lines) │ ├── bill_text.py (✅ created) │ ├── politician_info.py (✅ created) │ └── elections.py (⏳ not yet created) ├── utils/ │ ├── neo4j_client.py (✅ existing) │ ├── postgres_client.py (✅ existing) │ └── progress.py (✅ existing) /Users/matthewdufresne/FedMCP/ ├── test_hansard_sample.py (✅ created, running) ├── test_bill_text_sample.py (✅ created, not tested) ├── test_sample_imports.py (⏳ not yet created) └── packages/data-pipeline/.env (✅ configured) ``` ## Performance Notes - **Hansard**: Sample of 1,000 statements took ~12 minutes (including 18K documents, 155K+ relationships) - **Batch Sizes**: - Documents: 5,000 per batch - Statements: 10,000 per batch - Relationships: Created iteratively with progress tracking - **Full Import Estimates**: - 3.67M statements: ~2-4 hours - Creating all relationships: additional 1-2 hours ## Issues Fixed 1. **PostgreSQL DISTINCT/ORDER BY Error**: - Error: "ORDER BY expressions must appear in select list" with SELECT DISTINCT - Fix: Added ORDER BY column to SELECT, then filtered unique values in Python 2. **Neo4j API Method Name**: - Error: 'Neo4jClient' object has no attribute 'execute_query' - Fix: Changed to correct method name `run_query` throughout codebase 3. **Test Script API Consistency**: - Updated test_hansard_sample.py to use correct `run_query` method - Ensures consistency across test scripts and modules --- **Last Updated**: 2025-11-05 04:37 AM **Status**: 3 of 4 modules created, Hansard test in progress **Next**: Complete Hansard test validation, then proceed with remaining modules

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/northernvariables/FedMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server