FedMCP - Federal Parliamentary Information

MIT License

FedMCP

OPENPARLIAMENT_INTEGRATION.md•7.91 kB

# OpenParliament Database Integration ## Overview This document tracks the integration of the complete OpenParliament PostgreSQL database into FedMCP's Neo4j knowledge graph. This adds 500K+ individual MP votes, historical MP terms, election results, Hansard debates, and political contributions to enable deep accountability analysis. ## Current Status: Phase 1 - Setup ✅ ### Completed - ✅ Analyzed OpenParliament SQL dump (1.3GB, 63 tables, 6.3M lines) - ✅ Reviewed OpenParliament source code (Django models, importers) - ✅ Created integration plan with 6-week phased rollout - ✅ Installed PostgreSQL 14 - ✅ Created database setup script (`scripts/setup_openparliament_db.sh`) ### Next Steps 1. **Run the setup script** to load the OpenParliament dump: ```bash cd /Users/matthewdufresne/FedMCP ./scripts/setup_openparliament_db.sh ``` ⚠️ This will take 15-30 minutes to decompress and load ~1.3GB of data 2. **Add PostgreSQL credentials** to `packages/data-pipeline/.env`: ``` POSTGRES_URI=postgresql://fedmcp:fedmcp2024@localhost:5432/openparliament ``` 3. **Create PostgreSQL client utilities** (`packages/data-pipeline/fedmcp_pipeline/clients/postgres_client.py`) 4. **Begin data import** starting with MemberVote records (highest ROI) ## Integration Plan ### Phase 1: Core Accountability Data (Weeks 1-2) 🔵 IN PROGRESS **Goal:** Enable corruption detection and MP performance analysis | Task | Status | Est. Rows | Impact | |------|--------|-----------|--------| | PostgreSQL setup | ✅ Complete | - | Foundation | | Import MemberVote | ⏳ Pending | 500K | Individual vote records | | Import ElectedMember | ⏳ Pending | 5K | Historical MP terms | | Enhance MP nodes | ⏳ Pending | 350 | Add historical fields | | Test queries | ⏳ Pending | - | Verify "How did X vote on Y?" | **Deliverables:** - Query "How did MP X vote on Bill Y?" - Track party-switching and riding changes - Analyze party loyalty and dissent patterns ### Phase 2: Electoral Context (Week 3) ⚪ PLANNED **Goal:** Understand electoral competitiveness - Create `:Election` and `:Candidacy` node types - Import 50K candidacy records - Enable electoral history queries per riding ### Phase 3: Debate Search (Weeks 4-5) ⚪ PLANNED **Goal:** Enable natural language search of speeches - Import recent Hansard statements (~400K, last 2 parliaments) - Set up Neo4j full-text search - Create `(MP)-[:SPOKE]->(Statement)-[:MENTIONED]->(Bill)` chains ### Phase 4: Financial Transparency (Week 6) ⚪ PLANNED **Goal:** Track donation patterns - Import 200K political contributions - Create donor network relationships - Enable donation-vote correlation analysis ## Database Schema ### New Node Types ```cypher // Phase 1 (:Term { id: STRING, start_date: DATE, end_date: DATE, parliament: INTEGER, session: STRING }) // Phase 2 (:Election {id: STRING, date: DATE, type: STRING}) (:Candidacy {vote_total: INTEGER, vote_percent: FLOAT, elected: BOOLEAN}) // Phase 3 (:Statement {content_en: TEXT, time: DATETIME, wordcount: INTEGER}) (:Document {date: DATE, type: STRING, parliament: INTEGER}) // Phase 4 (:Donor {name: STRING, city: STRING, province: STRING}) (:Contribution {amount: FLOAT, date: DATE, recipient_type: STRING}) ``` ### New Relationships ```cypher // Phase 1 (MP)-[:CAST_VOTE {position: STRING, dissent: BOOLEAN}]->(Vote) (MP)-[:SERVED_TERM]->(Term)-[:IN_PARTY]->(Party) (Term)-[:REPRESENTED]->(Riding) // Phase 2 (MP)-[:RAN_AS]->(Candidacy)-[:IN_ELECTION]->(Election) // Phase 3 (MP)-[:SPOKE {wordcount: INT}]->(Statement) (Statement)-[:MENTIONED]->(Bill) // Phase 4 (Donor)-[:DONATED]->(Contribution)-[:TO_PARTY|TO_MP]-> ``` ## Data Sources & Update Strategy ### Primary Sources 1. **OpenParliament PostgreSQL dump** - Complete historical data (1994-present) - Frequency: Monthly full sync - Coverage: 500K votes, 2M statements, 50K elections, 200K contributions - Source of truth for conflicts 2. **OpenParliament API** - Current session only - Frequency: Daily incremental updates - Coverage: Latest bills, votes, MPs - Fast, reliable 3. **LEGISinfo JSON** - Bill details - Frequency: Daily - Coverage: Current parliament bills ### Update Schedule (Recommended) - **Daily (6am ET):** API-based incremental sync - **Monthly (1st Sunday, 2am ET):** Full PostgreSQL sync ## Key Findings from Analysis ### Critical Missing Data (Now Available) 1. **Individual Vote Records** - How each MP voted (Y/N/P) on every question 2. **Historical Terms** - MP party switches, riding changes over time 3. **Hansard Speeches** - 2M+ statements with full text 4. **Election Results** - Vote totals, margins, competitiveness 5. **Political Contributions** - 200K donations with amounts and dates ### Database Statistics - **Total Size:** 1.3GB compressed, ~6GB uncompressed - **Total Tables:** 63 (13 core, 5 bills, 5 hansard, 7 committees, etc.) - **Largest Tables:** - `hansards_statement`: 2M+ rows (speeches) - `bills_membervote`: 500K+ rows (individual votes) - `financials_contribution`: 200K+ rows (donations) - `elections_candidacy`: 50K+ rows (election results) ## Success Metrics ### After Phase 1 - ✅ Can answer "How did every MP vote on Bill C-11?" - ✅ Can track "When did MP X switch from Party A to Party B?" - ✅ Can detect "MPs who voted against party line" ### After All Phases - ✅ Can search "Find Trudeau speeches mentioning climate" - ✅ Can analyze "Who donated to Conservatives in 2019?" - ✅ Can correlate "Donations → Votes → Party loyalty" ## Technical Considerations ### Performance - **Expected Neo4j Scale:** 5M+ nodes, 15M+ relationships - **Storage:** ~50GB (with full Hansard text) - **RAM Recommendation:** ≥16GB for Neo4j ### Preservation of Existing Functionality - **Zero breaking changes** - All existing queries work unchanged - **Additive approach** - New relationships supplement existing ones - **Backward compatible** - `mp.party` and `mp.riding` remain for current lookup ## OpenParliament Source Code Insights ### Django Models (Most Relevant) - **core/models.py** - Politician, Party, Session, ElectedMember - **bills/models.py** - Bill, VoteQuestion, MemberVote, PartyVote - **hansards/models.py** - Document, Statement (with bill/politician links) - **committees/models.py** - Committee, CommitteeMeeting, CommitteeReport - **elections/models.py** - Election, Candidacy ### Import Scripts - **legisinfo.py** - Fetches bills from parl.ca API - **parl_document.py** - Downloads Hansard XML - **parl_cmte.py** - Scrapes committee data - **parlvotes.py** - Imports voting records - **mps.py** - Updates MP information - **election.py** - Election results ## Files Created ### Scripts - `scripts/setup_openparliament_db.sh` - PostgreSQL database setup and dump load ### Documentation - `OPENPARLIAMENT_INTEGRATION.md` (this file) - Integration tracking doc ### To Be Created - `packages/data-pipeline/fedmcp_pipeline/clients/postgres_client.py` - PostgreSQL connection utilities - `packages/data-pipeline/fedmcp_pipeline/ingest/openparliament.py` - Import functions - `packages/data-pipeline/scripts/openparliament_sync.py` - Monthly sync script ## Resources - **OpenParliament GitHub:** https://github.com/michaelmulley/openparliament - **SQL Dump Location:** `/Users/matthewdufresne/FedMCP/openparliament.public.sql.bz2` - **PostgreSQL Path:** `/opt/homebrew/opt/postgresql@14/bin` - **Database Name:** `openparliament` - **Database User:** `fedmcp` / `fedmcp2024` ## Contact For questions or to continue implementation, refer to: 1. This document for current status 2. The comprehensive analysis report generated by the Plan agent 3. The 6-week implementation plan in the ExitPlanMode output --- **Last Updated:** 2025-11-05 **Status:** Phase 1 - PostgreSQL Setup Complete **Next Action:** Run `./scripts/setup_openparliament_db.sh`

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/northernvariables/FedMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server