BILLS_DATA_ENHANCEMENT_STATUS.md•7.97 kB
# Bills Data Completeness Enhancement - Implementation Status
## Executive Summary
Successfully implemented infrastructure to increase bills data completeness from ~40% to ~80%+ by adding missing fields from LEGISinfo API.
## Completed Tasks
### 1. ✅ Data Enrichment Script (`scripts/enrich_bills.py`)
Created comprehensive enrichment script that:
- Fetches detailed bill data from LEGISinfo individual bill API
- Populates 15+ missing fields including:
- `summary` / `summary_fr` - Legislative summaries
- `bill_type` / `bill_type_fr` - Bill classification (Government/Private Member)
- `is_government_bill` / `is_private_member_bill` - Boolean flags
- `originating_chamber` / `originating_chamber_fr` - House vs Senate
- `latest_event` - Most recent bill event
- `is_proforma` / `bill_form` - Pro forma bill flags
- `statute_year` / `statute_chapter` - If passed into law
- `reinstated_from_previous` / `reinstated_from_bill` - Cross-session tracking
- Creates REFERRED_TO relationships with committees
- Includes progress tracking, error handling, and dry-run mode
- Supports batch processing with --limit flag for testing
**Usage:**
```bash
# Test with first 10 bills
python scripts/enrich_bills.py --limit 10 --dry-run
# Run full enrichment
python scripts/enrich_bills.py
```
### 2. ✅ GraphQL Schema Updates (`packages/graph-api/src/schema.ts`)
Enhanced Bill type with:
- **23 new fields** exposing all enrichment data
- Organized into logical groups:
- Basic identifiers (parliament, session_number)
- Titles (title_fr)
- Summaries (summary, summary_fr, full_summary_available)
- Status & progress (status_fr, latest_event)
- Bill classification (bill_type, is_government_bill, etc.)
- Sponsor info (sponsor_name)
- All reading stage dates (passed_house_first_reading, etc.)
- Statute info (statute_year, statute_chapter)
- Cross-session relationships (reinstated_from_previous)
- Added Senate sponsor relationship
- Changed `referredTo` from single to `[Committee!]!` array
**Updated searchBills query** with new filters:
- `session` - Filter by parliamentary session
- `bill_type` - Filter by bill type
- `is_government_bill` - Government vs private member bills
- `originating_chamber` - House vs Senate
- Improved limit (50 → 100)
- Search by bill number in addition to title
### 3. ✅ Database Ingestion Code (No changes needed)
The existing `ingest_bills_from_legisinfo_json()` function already captures all fields available in the bulk JSON endpoint. The detailed fields (summary, bill_type, etc.) are only available via individual bill API calls, which is handled by the enrichment script.
## Remaining Tasks
### 4. ⏳ Frontend Bills List Page (`packages/frontend/src/app/bills/page.tsx`)
**Needed Updates:**
- Add bill type badges (Government/Private Member/Senate)
- Display summary preview (first 100 chars)
- Add filters for:
- Bill type dropdown
- Originating chamber filter
- Session selector
- Update GraphQL query to fetch new fields:
```graphql
query SearchBills($searchTerm: String, $session: String, $bill_type: String) {
searchBills(searchTerm: $searchTerm, session: $session, bill_type: $bill_type, limit: 100) {
number
session
title
summary
bill_type
is_government_bill
originating_chamber
status
introduced_date
sponsor {
name
party
}
}
}
```
### 5. ⏳ Frontend Bill Detail Page (`packages/frontend/src/app/bills/[session]/[number]/page.tsx`)
**Needed Updates:**
- Display full summary section
- Show bill classification badges
- Display originating chamber
- Show latest event status
- Display committee referrals
- Show all reading stage dates in timeline
- Display statute info if passed
- Update GraphQL query to fetch all new fields
### 6. ⏳ Run Enrichment Script
Execute enrichment on all bills (est. 5-10 minutes for ~2000 bills):
```bash
cd /Users/matthewdufresne/FedMCP
/Users/matthewdufresne/FedMCP/venv/bin/python scripts/enrich_bills.py
```
**Expected Results:**
- ~1500-2000 bills enriched with summaries
- ~100 committee relationships created
- Data completeness: 40% → 80%+
### 7. ⏳ Validation & Testing
**Testing Checklist:**
- [ ] GraphQL API returns new fields correctly
- [ ] Bills list page displays summaries and bill types
- [ ] Filters work (bill_type, chamber, session)
- [ ] Bill detail page shows all enriched data
- [ ] Committee relationships display
- [ ] Reading stage timeline displays correctly
**Validation Queries:**
```cypher
// Check enrichment coverage
MATCH (b:Bill)
RETURN
count(b) as total_bills,
count(b.summary) as bills_with_summary,
count(b.bill_type) as bills_with_type,
(count(b.summary) * 100.0 / count(b)) as summary_coverage
// Check committee relationships
MATCH (b:Bill)-[:REFERRED_TO]->(c:Committee)
RETURN count(DISTINCT b) as bills_with_committees
```
## Data Completeness Metrics
### Before Enrichment (~40%)
| Field | Coverage |
|-------|----------|
| number, session, title | 100% |
| status, introduced_date | 100% |
| summary | 0% |
| bill_type | 0% |
| originating_chamber | 0% |
| reading stage dates | 100% (but not exposed in schema) |
| committee relationships | 0% |
### After Enrichment (~80%)
| Field | Expected Coverage |
|-------|-------------------|
| number, session, title | 100% |
| status, introduced_date | 100% |
| summary | 75-85% |
| bill_type | 100% |
| is_government_bill | 100% |
| originating_chamber | 100% |
| latest_event | 90% |
| reading stage dates | 100% (now exposed) |
| committee relationships | 20-30% |
| statute info | 5-10% (only passed bills) |
## Implementation Notes
**Enrichment Script Features:**
- Rate limiting: 0.1s delay between API calls (polite, ~10 bills/sec)
- Error handling: Continues on individual bill failures
- Progress tracking: Real-time ETA and rate display
- Dry-run mode: Test before making changes
- Batch support: --limit flag for testing
- Idempotent: Can be run multiple times safely
**GraphQL Schema Considerations:**
- Maintains backward compatibility with legacy date fields
- French language support for all bilingual fields
- Supports both MP and Senator sponsors
- Multiple committee referrals per bill
**Frontend Recommendations:**
- Use bill_type for badge colors:
- Government Bill: Blue
- Private Member's Bill: Green
- Senate Bill: Purple
- Show summary as expandable/collapsible section
- Display reading stage dates as visual timeline
- Link committee names to committee detail pages
## Next Steps
1. **Complete Frontend Updates** (30-60 min)
- Update bills list page with filters and summaries
- Update bill detail page with all enriched data
2. **Run Enrichment** (5-10 min)
- Execute enrichment script on all bills
- Monitor progress and error rate
3. **Validate & Test** (15-30 min)
- Test GraphQL queries
- Test frontend display
- Verify data completeness metrics
4. **Future Enhancements** (Optional)
- Add full bill text fetching
- Track amendment history
- Create reinstated bill relationships (REINSTATED_FROM edges)
- Add sponsor person_id for better MP linking
## Success Criteria
- [x] Enrichment script created and tested
- [x] GraphQL schema updated and complete
- [ ] Frontend displays enriched data
- [ ] Data completeness > 75%
- [ ] All filters working
- [ ] No breaking changes to existing functionality
## Files Modified
1. **Created:**
- `scripts/enrich_bills.py` - Data enrichment script
2. **Modified:**
- `packages/graph-api/src/schema.ts` - GraphQL schema
- (Pending) `packages/frontend/src/app/bills/page.tsx` - Bills list
- (Pending) `packages/frontend/src/app/bills/[session]/[number]/page.tsx` - Bill detail
## Contact & Support
For questions or issues:
- Check script output: `python scripts/enrich_bills.py --help`
- Dry run first: `--dry-run` flag
- Test with small batch: `--limit 10`
- Monitor logs for API errors