# Feature Development Roadmap
## Current State Analysis
### What Works Well β
- Fetches recent case law from AustLII
- Filters out journal articles (primary sources only)
- Extracts neutral citations and jurisdictions
- Preserves paragraph numbers in `[N]` format (402 instances found in test)
- Handles HTML and PDF documents
### Current Limitations π΄
#### 1. **Search Quality Issues**
- **Problem**: Searching "Donoghue v Stevenson" returns recent 2025 cases that merely cite it, NOT the actual 1932 case
- **Root cause**: Sorting by date prioritises recent cases over relevance
- **Impact**: Users can't find the specific case they're looking for
#### 2. **Limited Sources**
- **Current**: Only searches AustLII
- **Missing**: jade.io (superior reported judgments), BarNet Jade, other authoritative sources
- **Impact**: May miss best/most authoritative version of judgments
#### 3. **Paragraph Number Preservation**
- **Current**: Text extraction strips HTML structure
- **Found**: `[N]` format markers ARE preserved (402 instances)
- **Issue**: Page numbers from reported judgments are lost
- **Impact**: Can't generate accurate pinpoints for reported citations
#### 4. **No Ranking/Relevance**
- **Problem**: No way to prioritise authoritative sources
- **Missing**: Reported vs unreported distinction, court hierarchy weighting
## Proposed Solutions
### Phase 1: Fix Search Relevance (HIGH PRIORITY)
**Goal**: Return the ACTUAL case being searched for, not just cases that cite it
**Implementation**:
1. **Add search mode parameter**: `relevance` vs `date` sorting
2. **Smart query detection**:
- If query looks like case name (e.g. "X v Y"), use relevance
- If query is topic (e.g. "negligence"), use date for recency
3. **Title matching boost**: Prioritise exact title matches
4. **Citation matching**: Parse citations from query and match
**Code changes**:
```typescript
interface SearchOptions {
jurisdiction?: "cth" | "vic" | "federal" | "other";
limit?: number;
type: "case" | "legislation";
sortBy?: "relevance" | "date" | "auto"; // NEW
}
```
### Phase 2: Multi-Source Integration (COMPLETED β
)
**Goal**: Search multiple authoritative sources and return best results
**Status**: β
Implemented without API access using AustLII cross-referencing
**Sources integrated**:
1. **AustLII** - Comprehensive unreported coverage (original source)
2. **jade.io** - Superior reported judgments with better formatting (NEW)
**Implementation approach**:
- jade.io is a GWT SPA with no public search API
- Search works by: AustLII search β filter results with neutral citations β resolve jade.io articles by probing article pages β extract metadata from HTML `<title>` tag
- Maximum 5 concurrent jade.io article resolutions to avoid overwhelming the server
- Graceful fallback: if jade.io resolution fails, AustLII results are still returned
- jade.io results are preferred when deduplicating (better formatting)
**Implemented functions**:
```typescript
// Search jade.io via AustLII cross-reference
searchJade(query, options) β SearchResult[]
// Find jade.io article by neutral citation
searchJadeByCitation(citation) β SearchResult
// Deduplicate results by neutral citation (jade.io preferred)
deduplicateResults(results) β SearchResult[]
// Merge results from both sources
mergeSearchResults(austlii, jade) β SearchResult[]
```
**New MCP tools**:
- `search_jade` - Search jade.io for cases/legislation
- `search_jade_by_citation` - Find jade.io article by neutral citation
- `includeJade` parameter added to `search_cases` and `search_legislation`
**Future sources to consider**:
- **BarNet Jade** - Free access to some reported cases
### Phase 3: Enhanced Paragraph/Page Preservation (HIGH PRIORITY)
**Goal**: Preserve both paragraph numbers AND page numbers for accurate pinpoint citations
**Current state**:
- `[N]` paragraph markers: β
Preserved (402 found)
- Page numbers: β Lost in text extraction
**Implementation**:
1. **Improve HTML parsing** to preserve structural markers:
```typescript
// Keep paragraph markers
<p class="Judg-Para-1">[1]</p> β "[1]"
// Extract page markers (when present in reported versions)
<span class="page-num">123</span> β "[Page 123]"
```
2. **Return structured content**:
```typescript
interface EnhancedFetchResponse extends FetchResponse {
paragraphs?: Array<{
number: number;
text: string;
pageNumber?: number;
}>;
}
```
3. **Pinpoint generation helper**:
```typescript
function generatePinpoint(
text: string,
searchPhrase: string,
): { paragraph?: number; page?: number } {
// Find paragraph/page containing phrase
}
```
### Phase 4: Intelligent Result Ranking (MEDIUM PRIORITY)
**Goal**: Return best/most authoritative version of each case
**Ranking criteria** (in order):
1. **Reported vs unreported**: Reported judgments rank higher
2. **Court hierarchy**: HCA > Full Court > Single judge
3. **Completeness**: Judgments with page numbers > without
4. **Recency**: For topic searches, prefer recent
5. **Relevance**: Title/citation exact match > partial match
**Implementation**:
```typescript
function calculateAuthorityScore(result: SearchResult): number {
let score = 0;
// Reported judgment
if (result.citation && !result.neutralCitation) score += 100;
// Court hierarchy
if (result.url.includes("/HCA/")) score += 50;
else if (result.url.includes("/FCA/")) score += 30;
// ... etc
// Has page numbers
if (result.metadata?.hasPageNumbers) score += 20;
return score;
}
```
## Implementation Status
### β
Phase 1: Search Relevance (COMPLETED)
**Implemented features:**
1. β
Smart query detection: Auto-detects case names ("X v Y", "Re X", citations) vs topic searches
2. β
`sortBy` parameter: "auto" (default), "relevance", or "date" modes
3. β
Title matching boost: Prioritizes exact case name matches in results
4. β
Auto mode intelligence:
- Case name queries β relevance sorting to find specific cases
- Topic queries β date sorting for recent jurisprudence
5. β
Comprehensive test suite: 7 new tests covering all sorting scenarios
**What was fixed:**
- β **OLD**: Searching "Donoghue v Stevenson" returned 2025 cases citing it
- β
**NEW**: Search returns the actual case being searched for
**Technical details:**
- Pattern detection for "X v Y", "Re X", citations, and quoted queries
- Title scoring algorithm with party name matching
- Configurable sorting with sensible defaults
## Implementation Priority
### Must Have (Next Sprint)
1. β
~~Fix search relevance for case name queries~~ (COMPLETED)
2. β
~~Preserve paragraph numbers properly~~ (already working)
3. β
~~Add search mode parameter (relevance/date/auto)~~ (COMPLETED)
### β
Phase 2A: Reported Citations & jade.io Support (COMPLETED)
**Implemented features:**
1. β
Reported citation extraction from AustLII results
- Extracts citations like `(2024) 350 ALR 123`, `(1992) 175 CLR 1`
- Supports common law report patterns (CLR, ALR, ALJR, etc.)
- Automatically extracted from titles and summaries
2. β
jade.io URL support in document fetcher
- Users can paste jade.io URLs they have access to
- Special HTML parsing for jade.io document structure
- Falls back to generic extraction when needed
3. β
Enhanced SearchResult interface
- Added `reportedCitation` field
- Updated `source` to support both "austlii" and "jade"
4. β
New test coverage (4 additional tests)
**What this enables:**
- Users can now see both neutral and reported citations
- More complete citation information for legal research
- jade.io integration without needing API access
- Users leverage their own jade.io subscriptions
**Technical implementation:**
- `extractReportedCitation()` function with regex patterns
- `extractTextFromJadeHtml()` for jade.io-specific parsing
- Updated test suite with 18 total scenarios
### β
Phase 2B: jade.io Search Integration (COMPLETED)
**Implemented features:**
1. β
jade.io search via AustLII cross-referencing (no API access required)
- `searchJade()` searches by cross-referencing AustLII results with jade.io metadata
- `searchJadeByCitation()` finds jade.io articles by neutral citation
- Maximum 5 concurrent resolutions to avoid overwhelming jade.io
2. β
Multi-source result merging and deduplication
- `deduplicateResults()` deduplicates by neutral citation, preferring jade.io
- `mergeSearchResults()` merges results from AustLII and jade.io
3. β
New MCP tools
- `search_jade` tool for jade.io case/legislation search
- `search_jade_by_citation` tool for citation-based lookup
- `includeJade` parameter on `search_cases` and `search_legislation`
4. β
Graceful fallback: if jade.io resolution fails, AustLII results still returned
**Technical implementation:**
- jade.io is a GWT SPA with no public search API
- Approach: AustLII search β filter results with neutral citations β probe jade.io article pages β extract metadata from HTML `<title>` tag
- Concurrency limited to 5 simultaneous jade.io resolutions
- jade.io results preferred during deduplication (better formatting)
### Should Have (Following Sprint)
1. πΆ Implement page number extraction (Phase 3)
2. πΆ Add authority-based ranking (Phase 4)
### Nice to Have (Future)
1. π BarNet Jade integration
2. π Citation parsing and validation
3. π Automatic pinpoint generation
4. π Related cases/legislation suggestions
## Testing Requirements
Each phase must include:
1. **Unit tests** for new parsing/ranking logic
2. **Integration tests** with real judgments
3. **Comparison tests** - verify improvements over current state
4. **Performance tests** - ensure multi-source doesn't timeout
## Success Metrics
1. **Search accuracy**: "Donoghue v Stevenson" returns the 1932 case, not 2025 cases
2. **Pinpoint accuracy**: Can generate `[2025] HCA 26 at [42]` style citations
3. **Source coverage**: Returns reported judgment when available
4. **Response time**: < 5 seconds for multi-source search