What can you do with this server?

A comprehensive legal research tool for accessing and analyzing Polish legal acts from the Sejm API, covering both Dziennik Ustaw (Official Journal of Laws) and Monitor Polski (Polish Monitor). Search & Discovery - Advanced multi-criteria search by year, title, keywords, document type, effectiveness dates, and active status; browse complete annual collections by publisher with pagination support Document Analysis - Retrieve full metadata, content (PDF/HTML), hierarchical table of contents, legal relationships (references and amendments), and document lifecycle tracking Reference Data - Access legal keywords, publishers, document statuses (active/repealed/consolidated), document types (laws/regulations/ordinances), and involved institutions (ministries, authorities, organizations) Date Utilities - Get current date in legal format (YYYY-MM-DD) and calculate date offsets (days/months/years) for legal periods and deadlines

Which integrations are available for this server?

Utilizes Git version control for repository management and development workflow Hosted on GitHub for source code management, distribution, and collaborative development Built using Python programming language with FastMCP framework for MCP server implementation Displays project status badges for Python version, license, and version information

How do I use Law Scrapper MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Law Scrapper MCP Find recent regulations about data protection from the last 2 years" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Law Scrapper MCP

A comprehensive Model Context Protocol (MCP) server for accessing and analyzing Polish legal acts from the Sejm API, enabling AI-powered legal research and document analysis.

Python version License Version

Features

Comprehensive legal act access - Full access to Polish legal acts from Dziennik Ustaw (DU) and Monitor Polski (MP)
Advanced search and filtering - Multi-criteria search by date, type, keywords, publisher, and status
Result Store with chained filtering - Store search results and filter with regex, type/status/year match, date ranges, sorting
Document Store pattern - Load acts into memory for efficient section-level navigation and search
Detailed document analysis - Metadata, structure, references, and content retrieval
Content processing - Automatic PDF-to-text and HTML-to-Markdown conversion
Date calculations - Specialized date utilities for legal document analysis
System metadata - Keywords, statuses, document types, and institution data
FastMCP integration - Built with FastMCP framework, flexible transport options
Async HTTP client - Efficient httpx client with retry logic and connection pooling
TTL caching - Intelligent response caching with configurable TTL
Structured logging - JSON and text log formats for easy debugging
Docker support - Containerized deployment with docker-compose
Comprehensive documentation - Examples and clear parameter descriptions

Requirements

Python: 3.13 or higher
Package manager: uv (recommended) or pip
Internet connection: Required for accessing Sejm API endpoints
MCP-compatible tool: Cursor IDE, Claude Code, or other MCP clients

Installation

Using uv (recommended)

# Clone the repository
git clone https://github.com/numikel/law-scrapper-mcp.git
cd law-scrapper-mcp

# Install dependencies
uv sync

# Install with dev dependencies
uv sync --extra dev

Using pip

# Clone the repository
git clone https://github.com/numikel/law-scrapper-mcp.git
cd law-scrapper-mcp

# Install dependencies
pip install -e .

Using uvx (no installation required)

For quick testing without cloning the repository:

# Run the server directly from GitHub
uvx --from git+https://github.com/numikel/law-scrapper-mcp law-scrapper

Quick start

STDIO transport (default)

STDIO is the default transport for MCP communication. Start the server and connect from your MCP client:

# Run the server
uv run python -m law_scrapper_mcp

# Or use the installed script
law-scrapper

Configure in your MCP client (e.g., Cursor .cursor/mcp.json):

{
  "mcpServers": {
    "law-scrapper-mcp": {
      "command": "law-scrapper"
    }
  }
}

For Claude Code:

claude mcp add law-scrapper "uvx --from git+https://github.com/numikel/law-scrapper-mcp law-scrapper"

HTTP transport (streamable-http)

Run the server on HTTP with streamable-http transport:

# Run with HTTP transport on port 7683
LAW_MCP_TRANSPORT=streamable-http uv run python -m law_scrapper_mcp

# Or specify custom host and port
LAW_MCP_TRANSPORT=streamable-http LAW_MCP_HOST=0.0.0.0 LAW_MCP_PORT=8080 uv run python -m law_scrapper_mcp

Configure in your MCP client:

{
  "mcpServers": {
    "law-scrapper-mcp": {
      "url": "http://localhost:7683/mcp",
      "transport": "streamable-http"
    }
  }
}

Note: The URL must include the /mcp path. FastMCP exposes the streamable-http endpoint at /mcp, not at the root. Using http://localhost:7683 without /mcp results in 404 (Not Found).

Docker

Build and run with Docker:

# Build the image
docker build -t law-scrapper-mcp .

# Run with STDIO transport (default)
docker run -it law-scrapper-mcp

# Run with HTTP transport on port 7683
docker run -it -p 7683:7683 -e LAW_MCP_TRANSPORT=streamable-http law-scrapper-mcp

Or use docker-compose:

# Run with STDIO transport
docker compose up

# Run with HTTP transport (set TRANSPORT=streamable-http in docker-compose.yml)
docker compose -f docker-compose.yml up

Configuration

All settings are configured via environment variables with the LAW_MCP_ prefix:

Variable	Default	Description
`LAW_MCP_TRANSPORT`	`stdio`	Transport: `stdio` or `streamable-http`
`LAW_MCP_HOST`	`0.0.0.0`	HTTP server host (when using streamable-http)
`LAW_MCP_PORT`	`7683`	HTTP server port (when using streamable-http)
`LAW_MCP_API_TIMEOUT`	`30.0`	HTTP request timeout in seconds
`LAW_MCP_API_MAX_CONCURRENT`	`10`	Maximum concurrent API requests
`LAW_MCP_API_MAX_RETRIES`	`3`	Maximum API request retries
`LAW_MCP_CACHE_METADATA_TTL`	`86400`	Metadata cache TTL (24 hours)
`LAW_MCP_CACHE_SEARCH_TTL`	`600`	Search results cache TTL (10 minutes)
`LAW_MCP_CACHE_BROWSE_TTL`	`3600`	Browse results cache TTL (1 hour)
`LAW_MCP_CACHE_DETAILS_TTL`	`3600`	Act details cache TTL (1 hour)
`LAW_MCP_CACHE_CHANGES_TTL`	`300`	Changes tracking cache TTL (5 minutes)
`LAW_MCP_CACHE_MAX_ENTRIES`	`1000`	Maximum cache entries
`LAW_MCP_DOC_STORE_MAX_DOCUMENTS`	`10`	Maximum documents in Document Store
`LAW_MCP_DOC_STORE_MAX_SIZE_BYTES`	`5242880`	Maximum Document Store size (5 MB)
`LAW_MCP_DOC_STORE_TTL`	`7200`	Document Store TTL (2 hours)
`LAW_MCP_CIRCUIT_BREAKER_THRESHOLD`	`5`	Failures before circuit breaker opens
`LAW_MCP_CIRCUIT_BREAKER_RECOVERY_TIMEOUT`	`60.0`	Seconds before trying recovery
`LAW_MCP_CIRCUIT_BREAKER_HALF_OPEN_MAX_CALLS`	`3`	Test calls in half-open state
`LAW_MCP_LOG_LEVEL`	`INFO`	Log level: DEBUG, INFO, WARNING, ERROR
`LAW_MCP_LOG_FORMAT`	`text`	Log format: `text` or `json`

Example environment configuration:

export LAW_MCP_TRANSPORT=streamable-http
export LAW_MCP_PORT=7683
export LAW_MCP_LOG_LEVEL=DEBUG
export LAW_MCP_CACHE_METADATA_TTL=86400

Tools reference

Law Scrapper MCP provides 13 tools for legal research and analysis:

1. get_system_metadata(category)

Retrieve system metadata for filtering and searching legal acts.

Parameters:

category (string, default: "all") - Metadata category: "keywords", "publishers", "statuses", "types", "institutions", or "all"

Returns: Keywords, publishers, document types, statuses, and institutions available in the system

Examples:

- Get all available search keywords
- Retrieve all legal document types
- List all publishers (DU, MP)
- Get all document statuses
- Get complete system metadata

2. search_legal_acts(publisher, year, keywords, detail_level, status, type)

Search for legal acts with advanced filtering options.

Parameters:

publisher (string) - Publisher code: "DU" (Dziennik Ustaw) or "MP" (Monitor Polski)
year (integer) - Publication year (e.g., 2024)
keywords (string) - Search keywords (AND logic - use multiple searches for OR)
detail_level (string, default: "standard") - Response detail: "minimal", "standard", or "full"
status (string, optional) - Document status filter
type (string, optional) - Document type filter

Returns: List of matching legal acts with metadata

Search note: Multiple keywords use AND logic. Search one keyword at a time for OR behavior.

Examples:

- Search DU 2024 for "environment protection" acts
- Find all MP 2023 acts with status "active"
- Search for COVID-19 related legislation
- Find acts by specific type (e.g., "regulation")
- Get minimal detail results for quick scanning

3. browse_acts(publisher, year, detail_level)

Browse all legal acts published in a specific year by publisher.

Parameters:

publisher (string) - Publisher code: "DU" or "MP"
year (integer) - Publication year
detail_level (string, default: "standard") - Response detail: "minimal", "standard", or "full"

Returns: Complete list of acts published in the specified year

Examples:

- Browse all DU acts from 2024
- Get minimal details of all MP acts from 2023
- Browse full details of DU 2022 legislation
- Get an overview of acts by publisher and year
- Track legislation published in a specific year

4. filter_results(result_set_id, pattern, field, type_equals, ...)

Filter and narrow down previously retrieved search/browse/changes results.

Parameters:

result_set_id (string) - Result set ID from a previous search/browse/changes call (e.g., "rs_1")
pattern (string, optional) - Regex pattern for text search (supports OR: "podatek|VAT|akcyza")
field (string, default: "title") - Field to search: "title", "eli", "status", "type", "publisher"
type_equals (string, optional) - Exact match on document type (e.g., "Ustawa", "Rozporządzenie")
status_equals (string, optional) - Exact match on status (e.g., "akt obowiązujący", "akt uchylony")
year_equals (integer, optional) - Exact match on publication year
date_field (string, optional) - Date field for range filter: "promulgation_date" or "effective_date"
date_from / date_to (string, optional) - Date range (YYYY-MM-DD)
sort_by (string, optional) - Sort field: "title", "year", "pos", "promulgation_date", etc.
sort_desc (boolean, default: false) - Sort descending
limit (integer, optional) - Maximum results to return

Returns: Filtered results with a new result_set_id for chained filtering

Examples:

- Filter search results to only "Rozporządzenie" type
- Search titles with regex "zdrow|apteka|lekar"
- Filter by date range and sort by promulgation date
- Chain filters: first by type, then by regex pattern
- Get top 10 most recent results

5. get_act_details(eli, load_content, detail_level)

Retrieve detailed information about a specific legal act and optionally load its content.

Parameters:

eli (string) - Act identifier in format "PUBLISHER/YEAR/NUMBER" (e.g., "DU/2024/1")
load_content (boolean, default: false) - Load act content into Document Store for section reading
detail_level (string, default: "standard") - Response detail: "minimal", "standard", or "full"

Returns: Act metadata (title, publication date, status, type, etc.), table of contents if load_content=true

Examples:

- Get metadata for act DU/2024/1
- Load act content for section-level reading
- Get full details including table of contents
- Retrieve act status and publication information
- Load multiple acts for comparison

6. read_act_content(eli, section)

Read content from a specific section of a loaded legal act.

Parameters:

eli (string) - Act identifier (must be loaded first via get_act_details with load_content=true)
section (string) - Section to read (e.g., "Art. 1", "Chapter 2", "Preamble")

Returns: Content of the requested section

Workflow note: Must call get_act_details(eli="...", load_content=true) first, then use this tool.

Examples:

- Read Article 1 from loaded act
- Get Chapter 2 content
- Read the Preamble section
- Access specific numbered articles
- Navigate act by chapters

7. search_in_act(eli, query)

Search for specific terms within a loaded legal act.

Parameters:

eli (string) - Act identifier (must be loaded first via get_act_details with load_content=true)
query (string) - Search term or phrase

Returns: Matching sections with context and location

Examples:

- Find all mentions of "penalty" in loaded act
- Search for specific legal terms
- Locate articles containing "fine" or "punishment"
- Find definitional sections
- Search for specific references

8. analyze_act_relationships(eli, relationship_type)

Analyze legal relationships and references of an act (amendments, references, etc.).

Parameters:

eli (string) - Act identifier
relationship_type (string, default: "all") - Type: "amends", "amended_by", "references", "referenced_by", or "all"

Returns: List of related acts and their relationships

Examples:

- Find which acts amend this legislation
- See what acts this legislation amends
- Get all legal references in the act
- Find acts that reference this legislation
- Analyze complete act relationship network

9. track_legal_changes(date_from, date_to, publisher, keywords)

Track legal changes and new acts within a date range.

Parameters:

date_from (string) - Start date (YYYY-MM-DD format)
date_to (string) - End date (YYYY-MM-DD format)
publisher (string, optional) - Filter by publisher: "DU" or "MP"
keywords (string, optional) - Filter by keywords

Returns: Legal acts published in the date range

Examples:

- Track changes from 2024-01-01 to 2024-12-31
- Find new DU acts from last month
- Get changes published in past 7 days
- Track legislation on specific topics over time
- Monitor legal changes by publisher and date range

10. calculate_legal_date(days, months, years, base_date)

Calculate legal dates with intuitive sign convention.

Parameters:

days (integer, default: 0) - Days offset (+future, -past)
months (integer, default: 0) - Months offset (+future, -past)
years (integer, default: 0) - Years offset (+future, -past)
base_date (string, optional) - Base date (YYYY, YYYY-MM, or YYYY-MM-DD format, defaults to today)

Returns: Calculated date and relative description

Sign convention: Positive = future, Negative = past

Examples:

- Get current date (call with no parameters)
- Calculate date 30 days in the future (+30)
- Calculate date 6 months in the past (-6 months)
- Calculate date 1 year from a specific date
- Calculate legal deadlines and periods

11. compare_acts(eli_a, eli_b)

Compare metadata of two legal acts.

Parameters:

eli_a (string) - ELI identifier of the first act (e.g., "DU/2024/1692")
eli_b (string) - ELI identifier of the second act (e.g., "DU/2024/1716")

Returns: Comparison of titles, types, statuses, dates, keywords overlap and differences

Examples:

- Compare two acts from the same year
- Compare old and new versions of legislation
- Identify metadata differences between related acts

12. list_result_sets()

Display active result sets stored in memory.

Returns: List of result sets with IDs, query summaries, counts, and creation times

13. list_loaded_documents()

Display documents loaded into the Document Store.

Returns: List of loaded documents with ELIs, sizes, section counts, and timestamps

Document Store workflow

The Document Store pattern enables efficient content navigation and search within legal acts:

Workflow steps

Load an act - Call get_act_details(eli="DU/2024/1", load_content=true) to load the act into the Document Store
Read sections - Use read_act_content(eli="DU/2024/1", section="Art. 1") to read specific sections
Search within act - Use search_in_act(eli="DU/2024/1", query="penalty") to find terms

Benefits

Efficient memory usage (configurable max documents and TTL)
Fast section-level navigation without refetching
Search within loaded acts without API calls
Automatic content processing (PDF→text, HTML→Markdown)

Configuration

LAW_MCP_DOC_STORE_MAX_DOCUMENTS - How many acts to keep in memory (default: 10)
LAW_MCP_DOC_STORE_MAX_SIZE_BYTES - Maximum memory usage (default: 5 MB)
LAW_MCP_DOC_STORE_TTL - How long to keep acts in memory (default: 2 hours)

Project structure

law-scrapper-mcp/
├── src/law_scrapper_mcp/
│   ├── __init__.py
│   ├── __main__.py              # Entry point for python -m
│   ├── server.py                # FastMCP app, lifespan, transport config
│   ├── config.py                # Pydantic settings (env vars)
│   ├── logging_config.py        # Structured logging setup
│   ├── models/                  # Pydantic models
│   │   ├── enums.py            # Enumerations
│   │   ├── api_responses.py    # Sejm API response models
│   │   ├── tool_inputs.py      # Tool input models
│   │   └── tool_outputs.py     # Tool output models
│   ├── client/                  # HTTP client
│   │   ├── sejm_client.py      # AsyncClient with retry and circuit breaker
│   │   ├── cache.py            # Async TTL cache implementation
│   │   ├── circuit_breaker.py  # Circuit breaker for API protection
│   │   └── exceptions.py       # Custom exceptions (Polish messages)
│   ├── services/                # Business logic
│   │   ├── metadata_service.py    # Metadata retrieval
│   │   ├── search_service.py      # Search and browse
│   │   ├── act_service.py         # Act details and content
│   │   ├── changes_service.py     # Change tracking
│   │   ├── document_store.py      # In-memory act storage
│   │   ├── result_store.py        # Search result persistence and filtering
│   │   ├── content_processor.py   # PDF/HTML processing
│   │   └── response_enrichment.py # Response hints
│   └── tools/                   # MCP tool definitions
│       ├── metadata.py          # get_system_metadata
│       ├── search.py            # search_legal_acts
│       ├── browse.py            # browse_acts
│       ├── act_details.py       # get_act_details
│       ├── act_content.py       # read_act_content
│       ├── act_search.py        # search_in_act
│       ├── relationships.py     # analyze_act_relationships
│       ├── filter_results.py    # filter_results, list_result_sets
│       ├── changes.py           # track_legal_changes
│       ├── compare.py           # compare_acts
│       ├── dates.py             # calculate_legal_date
│       └── error_handling.py    # Centralized @handle_tool_errors decorator
├── tests/
│   ├── unit/                    # Unit tests
│   └── integration/             # Integration tests with Sejm API
├── Dockerfile                   # Container image definition
├── docker-compose.yml           # Multi-service setup
├── pyproject.toml              # Project metadata and dependencies
├── uv.lock                      # Reproducible dependency lock
└── README.md                    # This file

Docker

Dockerfile

The included Dockerfile builds a containerized Law Scrapper MCP server:

FROM python:3.13-slim
WORKDIR /app
COPY . .
RUN pip install -e .
EXPOSE 7683
CMD ["law-scrapper"]

Build and run:

# Build the image
docker build -t law-scrapper-mcp .

# Run with STDIO transport
docker run -it law-scrapper-mcp

# Run with HTTP transport
docker run -it -p 7683:7683 -e LAW_MCP_TRANSPORT=streamable-http law-scrapper-mcp

# With custom settings
docker run -it -p 7683:7683 \
  -e LAW_MCP_TRANSPORT=streamable-http \
  -e LAW_MCP_LOG_LEVEL=DEBUG \
  law-scrapper-mcp

docker-compose.yml

Deployment with docker-compose:

# Start service
docker compose up -d

# View logs
docker compose logs -f

# Stop service
docker compose down

Migration guide (v1 to v2)

If upgrading from v1.0.2, note these breaking changes:

v1.0.2 (old)	v2.0.0 (new)	Notes
`get_current_date`	`calculate_legal_date()`	Call with no parameters for current date
`calculate_date_offset`	`calculate_legal_date(days/months/years)`	Use intuitive +future/-past sign convention
`get_legal_keywords`	`get_system_metadata(category="keywords")`	Consolidated into one tool
`get_legal_publishers`	`get_system_metadata(category="publishers")`	Consolidated into one tool
`get_legal_statuses`	`get_system_metadata(category="statuses")`	Consolidated into one tool
`get_legal_types`	`get_system_metadata(category="types")`	Consolidated into one tool
`get_legal_institutions`	`get_system_metadata(category="institutions")`	Consolidated into one tool
`get_publisher_details`	N/A	Use `get_system_metadata(category="publishers")`
`search_legal_acts`	`search_legal_acts`	Enhanced with `detail_level` parameter
`get_publisher_year_acts`	`browse_acts`	Renamed for clarity
`get_act_comprehensive_details`	`get_act_details`	Added `load_content` and `detail_level`
`get_act_content`	`read_act_content`	Requires pre-loading with `get_act_details`
`get_act_table_of_contents`	`get_act_details`	TOC included in details response
`get_act_relationships`	`analyze_act_relationships`	Renamed for clarity
ELI format	Single string "DU/2024/1"	Changed from separate parameters
SSE transport	STDIO (default)	STDIO is default, HTTP via streamable-http
Port 7683	Port 7683	Same default HTTP port

What's new in v2.3.1

uvx / FastMCP fix — Fixed NameError: name 'Annotated' is not defined when running via uvx --from "git+https://github.com/numikel/law-scrapper-mcp" law-scrapper. Removed from __future__ import annotations from compare.py so parameter type hints resolve correctly during tool registration.

What's new in v2.3.0

3 new tools — compare_acts, list_result_sets, list_loaded_documents (total: 13 tools)
Circuit breaker — Protects against cascading failures when Sejm API is unavailable
Centralized error handling — @handle_tool_errors decorator with error classification and full tracebacks
asyncio.Lock migration — All stores use asyncio.Lock for proper async compatibility
Default search limit — Search/browse return max 20 results by default to limit token usage
Health endpoint — /health for Docker deployments with streamable-http transport
Polish error messages — All exception messages in Polish for consistent user experience
Decision tree docstrings — "When to use" / "When NOT to use" for all tools

Development

Setup

# Install dependencies
uv sync

# Install with dev dependencies
uv sync --extra dev

Running tests

# Run unit tests
uv run pytest tests/unit/ -v

# Run integration tests (requires internet)
uv run pytest tests/integration/ -v -m integration

# Run all tests with coverage
uv run pytest --cov=law_scrapper_mcp --cov-report=term-missing

# Run with timeout for slow tests
uv run pytest --timeout=10 -v

Code quality

The project follows FastMCP best practices:

Modular architecture - Separated concerns (models, client, services, tools)
Type hints - Full type annotation with Pydantic models
Async throughout - Async/await for all I/O operations
Comprehensive examples - Minimum 5 examples per tool
Tagged tools - Organized by category for easy discovery
Annotated parameters - Clear descriptions for all inputs
Structured logging - Configurable JSON/text formats

Running the server

# STDIO transport (default)
uv run python -m law_scrapper_mcp

# HTTP transport
LAW_MCP_TRANSPORT=streamable-http uv run python -m law_scrapper_mcp

# With debug logging
LAW_MCP_LOG_LEVEL=DEBUG uv run python -m law_scrapper_mcp

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes using Conventional Commits format
Add tests for new functionality
Ensure all tests pass and coverage is maintained
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development guidelines

Follow FastMCP best practices for tool definitions
Include comprehensive examples and parameter descriptions
Add appropriate tags for tool categorization
Write async code throughout
Add tests for all new functionality
Update CHANGELOG.md with your changes
Use English for all code comments and documentation

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

@numikel

Developed with help from:

Cursor Claude Code

And with models:

Claude Opus 4.6 Claude Opus 4.5 Claude Sonnet 4.5 Claude Haiku 4.5

Legal disclaimer: This tool provides access to Polish legal documents for research purposes. Always consult with qualified legal professionals for legal advice and interpretation of laws.

Law Scrapper MCP

Features

Requirements

Installation

Using uv (recommended)

Using pip

Using uvx (no installation required)

Quick start

STDIO transport (default)

HTTP transport (streamable-http)

Docker

Configuration

Tools reference

1. get_system_metadata(category)

2. search_legal_acts(publisher, year, keywords, detail_level, status, type)

3. browse_acts(publisher, year, detail_level)

4. filter_results(result_set_id, pattern, field, type_equals, ...)

5. get_act_details(eli, load_content, detail_level)

6. read_act_content(eli, section)

7. search_in_act(eli, query)

8. analyze_act_relationships(eli, relationship_type)

9. track_legal_changes(date_from, date_to, publisher, keywords)

10. calculate_legal_date(days, months, years, base_date)

11. compare_acts(eli_a, eli_b)

12. list_result_sets()

13. list_loaded_documents()

Document Store workflow

Workflow steps

Benefits

Configuration

Project structure

Docker

Dockerfile

docker-compose.yml

Migration guide (v1 to v2)

What's new in v2.3.1

What's new in v2.3.0

Development

Setup

Running tests

Code quality

Running the server

Contributing

Development guidelines

License

Author

Resources

Tools

Appeared in Searches

Latest Blog Posts

MCP directory API