Skip to main content
Glama
architecture.md2.69 kB
# Architecture Notes ## Objective Deliver an MCP server that can: - Search Australian legislation and case law from public sources (AustLII first, official registers where available). - Return clean full text for use by LLMs, including OCR conversion for scanned PDFs. - Prefer reported citations when available but fall back to neutral citations when paywalled or inaccessible. ## Key Components ### MCP Server (`src/index.ts`) - Registers `search_legislation`, `search_cases`, and `fetch_document_text` tools. - Normalises tool arguments and orchestrates downstream services. - Formats responses for LLM consumption (structured JSON with citation metadata). ### AustLII Service (`src/services/austlii.ts`) - Executes HTTP searches against AustLII (`sinosrch.cgi`) with scoped filters. - Parses result HTML, capturing: - Title - Reported citation (if provided) - Neutral citation - Jurisdiction code and year - Snippets for relevance - Returns consistent `SearchResult` objects for cases and legislation. - TODO: Add pagination handling and graceful degradation on rate limits. ### Document Fetcher (`src/services/fetcher.ts`) - Retrieves HTML or PDF content from provided URLs. - Extracts text via: - Cheerio for HTML - `pdf-parse` for text-enabled PDFs - `node-tesseract-ocr` for image-based PDFs (triggered when PDF returns minimal text). - Produces `FetchResponse` with metadata (`ocrUsed`, content type, detected citations inside the document). - TODO: Cache downloaded files (tmpdir) and cleanup. ### Citation Normaliser (`src/services/citation.ts`) – planned - Recognises neutral citation patterns (e.g. `[2021] HCA 12`). - Forms fallback URLs (AustLII, JADE, BarNet LawCite) when original download fails. - Produces machine-readable structure for LLM prompts. ## Deployment - Node.js 20+ runtime with system-level Tesseract (`tesseract-ocr` package on Debian/Ubuntu). - Docker image based on `node:20-bookworm-slim`, installing Tesseract + dependencies. - CI workflow (GitHub Actions) to lint, test, build, and publish container image. ## Testing Strategy - Unit tests with Vitest using recorded fixtures for AustLII HTML responses. - Integration tests behind `npm run test:e2e` hitting live endpoints (skipped in CI without opt-in). - OCR path tests using sample scanned PDF (placed in `test/fixtures`). ## Open Questions - Which additional sources should we index for redundancy (e.g. Federal Register of Legislation API, Victorian Legislation & Parliamentary Documents API)? - How aggressively should we cache results to avoid re-hitting public endpoints? - Should we implement rate limiting/backoff within the server to respect source usage policies?

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/russellbrenner/auslaw-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server