FedMCP - Federal Parliamentary Information

OPENPARLIAMENT_INGESTION_STATUS.md•7.94 KiB

# OpenParliament PostgreSQL → Neo4j Ingestion Status ## Overview Migrating OpenParliament PostgreSQL database data into Neo4j graph database to fill identified gaps: - **3.67M Hansard statements** - **5,280 bill texts** - **35K politician info records** - **21K election candidacies** ## Completed Modules ✅ ### 1. Hansard Statements (`hansard.py`) **Status**: Module created and currently testing **Records**: 3.67M statements, 18,416 documents **File**: `/Users/matthewdufresne/FedMCP/packages/data-pipeline/fedmcp_pipeline/ingest/hansard.py` (562 lines) **Functions**: - `ingest_hansard_documents()` - Import debate documents - `ingest_hansard_statements()` - Import 3.67M statement records - `link_statements_to_mps()` - Create MADE_BY relationships - `link_statements_to_documents()` - Create PART_OF relationships - `link_bill_mentions()` - Create MENTIONS relationships for bill references - `create_hansard_indexes()` - Create full-text search indexes - `ingest_hansard_sample()` - Sample import (1,000 statements) - `ingest_hansard_full()` - Full import (3.67M statements) **Test Results** (currently running in background): ``` ✅ Created 18,416 Document nodes ✅ Created 1,000 Statement nodes ⚠️ 0 MADE_BY relationships (MPs not yet imported from PostgreSQL) 🔄 Creating PART_OF relationships (155,000+ created and counting) ``` **Fixes Applied**: 1. Fixed SQL query error: Changed `SELECT DISTINCT document_id ORDER BY time` to include `time` in SELECT list 2. Fixed API method naming: Changed all `execute_query` to `run_query` throughout module ### 2. Bill Texts (`bill_text.py`) **Status**: Module created, testing pending **Records**: 5,280 bill texts with English/French text and summaries **File**: `/Users/matthewdufresne/FedMCP/packages/data-pipeline/fedmcp_pipeline/ingest/bill_text.py` **Functions**: - `ingest_bill_texts()` - Import bill text records - `link_texts_to_bills()` - Create HAS_TEXT relationships - `create_bill_text_schema()` - Create indexes and constraints - `ingest_bill_text_sample()` - Sample import (10 texts) - `ingest_bill_text_full()` - Full import (5,280 texts) **Neo4j Schema**: - Node: `BillText` with properties: id, docid, text_en, text_fr, summary_en, created - Relationship: `(Bill)-[:HAS_TEXT]->(BillText)` - Full-text indexes on text_en, text_fr, summary_en ### 3. Politician Info (`politician_info.py`) **Status**: Module created, testing pending **Records**: 2,958 politicians with 38,641 info records **File**: `/Users/matthewdufresne/FedMCP/packages/data-pipeline/fedmcp_pipeline/ingest/politician_info.py` **Functions**: - `enrich_politician_info()` - Add biographical data to existing Politician nodes - `ingest_politician_info_sample()` - Sample enrichment (10 politicians) - `ingest_politician_info_full()` - Full enrichment (2,958 politicians) **Info Schema Types** (18 total, top 10): - alternate_name: 14,731 records - parl_affil_id: 12,569 records - parl_id: 2,958 records - favourite_word: 1,315 records - email: 921 records - phone: 920 records - constituency_offices: 916 records - parl_mp_id: 886 records - parlinfo_id: 751 records - twitter_id: 539 records ## In Progress 🔄 ### Hansard Sample Test **Background Process**: Process ID `95adb9` **Log File**: `/tmp/hansard_test_v3.log` (note: filename has space, actual path may vary) **Status**: Creating PART_OF relationships (155,000+ created, still running) **Expected Completion**: Should finish within next 10-20 minutes ## Remaining Work ⏳ ### 1. Elections Module (`elections.py`) **Not Yet Created** **Records**: 21,000 election candidacies **PostgreSQL Table**: `elections_candidacy` **Planned Functions**: - `ingest_election_candidacies()` - Import candidacy records - `link_candidacies_to_politicians()` - Create RAN_FOR relationships - `ingest_elections_sample()` - Sample import (50 candidacies) - `ingest_elections_full()` - Full import (21K candidacies) **Schema Info Needed**: ```sql -- Run to get table structure: \d elections_candidacy -- Key fields likely include: -- - politician_id (FK to core_politician) -- - election_id -- - riding/constituency -- - party -- - elected (boolean) -- - vote_count ``` ### 2. Comprehensive Test Script **Not Yet Created** **File**: `test_sample_imports.py` **Should Test**: 1. Hansard sample (1,000 statements) 2. Bill text sample (10 texts) 3. Politician info sample (10 politicians) 4. Elections sample (50 candidacies) **Validation Checks**: - Node counts match expected - Relationship counts correct - Sample data queries work - Full-text search indexes functional ### 3. Testing & Validation **Status**: Only Hansard test running so far **Remaining Tests**: - [ ] Test bill_text.py sample import - [ ] Test politician_info.py sample enrichment - [ ] Create and test elections.py - [ ] Run comprehensive test script with all modules ### 4. Documentation **Optional Final Step**: - Import guide for each module - Performance benchmarks - Common issues and solutions ## Database Connection Info **PostgreSQL** (OpenParliament): ``` Host: localhost Port: 5432 Database: openparliament User: fedmcp Password: fedmcp2024 ``` **Neo4j** (Graph Database): ``` URI: bolt://localhost:7687 User: neo4j Password: canadagpt2024 ``` **Environment File**: `/Users/matthewdufresne/FedMCP/packages/data-pipeline/.env` ## Next Steps 1. **Wait for Hansard test to complete** (~10-20 minutes) - Check progress: `tail -f /tmp/hansard_test_v3.log` - Or check background process status 2. **Test bill_text.py module**: ```bash /Users/matthewdufresne/FedMCP/venv/bin/python \ /Users/matthewdufresne/FedMCP/test_bill_text_sample.py ``` 3. **Create elections.py module**: - Check PostgreSQL schema: `\d elections_candidacy` - Follow pattern from hansard.py and bill_text.py - Include sample and full import functions 4. **Create comprehensive test script**: - Test all 4 modules in sequence - Validate counts and relationships - Report any errors or gaps 5. **Run full imports** (after sample tests pass): - Hansard: 3.67M statements (~2-4 hours estimated) - Bill texts: 5,280 texts (~5-10 minutes) - Politician info: 2,958 politicians (~2-5 minutes) - Elections: 21K candidacies (~5-10 minutes) ## Key Files ``` /Users/matthewdufresne/FedMCP/packages/data-pipeline/fedmcp_pipeline/ ├── ingest/ │ ├── __init__.py │ ├── hansard.py (✅ created, 562 lines) │ ├── bill_text.py (✅ created) │ ├── politician_info.py (✅ created) │ └── elections.py (⏳ not yet created) ├── utils/ │ ├── neo4j_client.py (✅ existing) │ ├── postgres_client.py (✅ existing) │ └── progress.py (✅ existing) /Users/matthewdufresne/FedMCP/ ├── test_hansard_sample.py (✅ created, running) ├── test_bill_text_sample.py (✅ created, not tested) ├── test_sample_imports.py (⏳ not yet created) └── packages/data-pipeline/.env (✅ configured) ``` ## Performance Notes - **Hansard**: Sample of 1,000 statements took ~12 minutes (including 18K documents, 155K+ relationships) - **Batch Sizes**: - Documents: 5,000 per batch - Statements: 10,000 per batch - Relationships: Created iteratively with progress tracking - **Full Import Estimates**: - 3.67M statements: ~2-4 hours - Creating all relationships: additional 1-2 hours ## Issues Fixed 1. **PostgreSQL DISTINCT/ORDER BY Error**: - Error: "ORDER BY expressions must appear in select list" with SELECT DISTINCT - Fix: Added ORDER BY column to SELECT, then filtered unique values in Python 2. **Neo4j API Method Name**: - Error: 'Neo4jClient' object has no attribute 'execute_query' - Fix: Changed to correct method name `run_query` throughout codebase 3. **Test Script API Consistency**: - Updated test_hansard_sample.py to use correct `run_query` method - Ensures consistency across test scripts and modules --- **Last Updated**: 2025-11-05 04:37 AM **Status**: 3 of 4 modules created, Hansard test in progress **Next**: Complete Hansard test validation, then proceed with remaining modules

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/northernvariables/FedMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

OPENPARLIAMENT_INGESTION_STATUS.md•7.94 KiB