AusLaw MCP

architecture.md•4.04 KiB

# Architecture Notes ## Objective Deliver an MCP server that can: - Search Australian legislation and case law from public sources (AustLII first, official registers where available). - Return clean full text for use by LLMs, including OCR conversion for scanned PDFs. - Prefer reported citations when available but fall back to neutral citations when paywalled or inaccessible. ## Key Components ### MCP Server (`src/index.ts`) - Registers `search_legislation`, `search_cases`, `search_jade`, `search_jade_by_citation`, `fetch_document_text`, `resolve_jade_article`, and `jade_citation_lookup` tools. - Normalises tool arguments and orchestrates downstream services. - Formats responses for LLM consumption (structured JSON with citation metadata). - Supports `includeJade` parameter on `search_cases` and `search_legislation` for multi-source merging. ### AustLII Service (`src/services/austlii.ts`) - Executes HTTP searches against AustLII (`sinosrch.cgi`) with scoped filters. - Parses result HTML, capturing: - Title - Reported citation (if provided) - Neutral citation - Jurisdiction code and year - Snippets for relevance - Returns consistent `SearchResult` objects for cases and legislation. - TODO: Add pagination handling and graceful degradation on rate limits. ### jade.io Service (`src/services/jade.ts`) - Searches jade.io by cross-referencing AustLII results with jade.io article metadata. - **Strategy**: jade.io is a GWT SPA with no public search API. Instead: 1. Perform an AustLII search to get results with neutral citations 2. For each result with a neutral citation, probe jade.io to resolve article info 3. Extract metadata from the jade.io article page `<title>` tag - Maximum 5 concurrent jade.io article resolutions to avoid overwhelming the server. - Graceful fallback: if jade.io resolution fails, AustLII results are still returned. - **Key functions**: - `searchJade(query, options)` – Full jade.io search via AustLII cross-reference - `searchJadeByCitation(citation)` – Find jade.io article by neutral citation - `deduplicateResults(results)` – Deduplicate by neutral citation, preferring jade.io - `mergeSearchResults(austlii, jade)` – Merge results from both sources ### Document Fetcher (`src/services/fetcher.ts`) - Retrieves HTML or PDF content from provided URLs. - Extracts text via: - Cheerio for HTML - `pdf-parse` for text-enabled PDFs - `node-tesseract-ocr` for image-based PDFs (triggered when PDF returns minimal text). - Produces `FetchResponse` with metadata (`ocrUsed`, content type, detected citations inside the document). - TODO: Cache downloaded files (tmpdir) and cleanup. ### Citation Normaliser – planned - Will recognise neutral citation patterns (e.g. `[2021] HCA 12`) and form fallback URLs. - Currently, neutral citation extraction is handled inline in `austlii.ts` and `jade.ts` using the `NEUTRAL_CITATION_PATTERN` constant from `src/constants.ts`. - Reported citation extraction is handled by `extractReportedCitation()` in `austlii.ts`. ## Deployment - Node.js 18+ runtime with system-level Tesseract (`tesseract-ocr` package on Debian/Ubuntu). - Docker image based on `node:20-alpine`, installing Tesseract + dependencies. - CI workflow (GitHub Actions) to lint, test, build, and publish container image. ## Testing Strategy - Unit tests with Vitest using recorded fixtures for AustLII HTML responses. - Unit tests for configuration, constants, errors, logger, and formatter modules. - Mocked tests for austlii and fetcher services (network-isolated). - Integration tests hitting live AustLII and jade.io endpoints. - Performance benchmark tests for search latency and concurrent requests. - OCR path tests using sample scanned PDF (placed in `test/fixtures`). ## Open Questions - Which additional sources should we index for redundancy (e.g. Federal Register of Legislation API, Victorian Legislation & Parliamentary Documents API)? - How aggressively should we cache results to avoid re-hitting public endpoints? - Should we implement rate limiting/backoff within the server to respect source usage policies?

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/russellbrenner/auslaw-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

architecture.md•4.04 KiB