README.md•13.1 kB
# mcp-open-data-hk
[](https://smithery.ai/server/@mcp-open-data-hk/mcp-open-data-hk)
This is an MCP (Model Context Protocol) server that provides access to data from [DATA.GOV.HK](https://DATA.GOV.HK), the official open data portal of the Hong Kong government.
## Installation
### Installing via Smithery
To install mcp-open-data-hk for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@mcp-open-data-hk/mcp-open-data-hk):
```bash
npx -y @smithery/cli install @mcp-open-data-hk/mcp-open-data-hk --client claude
```
### Using uv (recommended)
When using [`uv`](https://docs.astral.sh/uv/) no specific installation is needed. We will
use [`uvx`](https://docs.astral.sh/uv/guides/tools/) to directly run _mcp-server-fetch_.
### Using PIP
Alternatively you can install `mcp-server-fetch` via pip:
```
pip install mcp-open-data-hk
```
After installation, you can run it as a script using:
```
python -m mcp_open_data_hk
```
After installation, configure your MCP-compatible client (like Cursor, Claude Code, or Claude Desktop) by adding the following to your settings.json:
<details>
<summary>Using uvx</summary>
```json
{
"mcpServers": {
"mcp-open-data-hk": {
"command": "uvx",
"args": ["mcp-open-data-hk"]
}
}
}
```
</details>
<details>
<summary>Using pip installation</summary>
```json
{
"mcpServers": {
"mcp-open-data-hk": {
"command": "python",
"args": ["-m", "mcp_open_data_hk"]
}
}
}
```
</details>
## Features
The server provides the following tools to interact with the DATA.GOV.HK API:
1. `list_datasets` - Get a list of dataset IDs
2. `get_dataset_details` - Get detailed information about a specific dataset
3. `list_categories` - Get a list of data categories
4. `get_category_details` - Get detailed information about a specific category
5. `search_datasets` - Search for datasets by query term with advanced options
6. `search_datasets_with_facets` - Search datasets and return faceted results
7. `get_datasets_by_format` - Get datasets by file format
8. `get_supported_formats` - Get list of supported file formats
## Tools
### list_datasets
Get a list of dataset IDs from DATA.GOV.HK
Parameters:
- `limit` (optional): Maximum number of datasets to return (default: 1000)
- `offset` (optional): Offset of the first dataset to return
- `language` (optional): Language code (en, tc, sc) - defaults to "en"
### get_dataset_details
Get detailed information about a specific dataset
Parameters:
- `dataset_id`: The ID or name of the dataset to retrieve
- `language` (optional): Language code (en, tc, sc) - defaults to "en"
- `include_tracking` (optional): Add tracking information to dataset and resources - defaults to False
### list_categories
Get a list of data categories (groups)
Parameters:
- `order_by` (optional): Field to sort by ('name' or 'packages') - deprecated, use sort instead
- `sort` (optional): Sorting of results ('name asc', 'package_count desc', etc.) - defaults to "title asc"
- `limit` (optional): Maximum number of categories to return
- `offset` (optional): Offset for pagination
- `all_fields` (optional): Return full group dictionaries instead of just names - defaults to False
- `language` (optional): Language code (en, tc, sc) - defaults to "en"
### get_category_details
Get detailed information about a specific category (group)
Parameters:
- `category_id`: The ID or name of the category to retrieve
- `include_datasets` (optional): Include a truncated list of the category's datasets - defaults to False
- `include_dataset_count` (optional): Include the full package count - defaults to True
- `include_extras` (optional): Include the category's extra fields - defaults to True
- `include_users` (optional): Include the category's users - defaults to True
- `include_groups` (optional): Include the category's sub groups - defaults to True
- `include_tags` (optional): Include the category's tags - defaults to True
- `include_followers` (optional): Include the category's number of followers - defaults to True
- `language` (optional): Language code (en, tc, sc) - defaults to "en"
### search_datasets
Search for datasets by query term using the package_search API.
This function searches across dataset titles, descriptions, and other metadata to find datasets matching the query term. It supports advanced Solr search parameters.
Parameters:
- `query` (optional): The solr query string (e.g., "transport", "weather", "_:_" for all) - defaults to "_:_"
- `limit` (optional): Maximum number of datasets to return (default: 10, max: 1000)
- `offset` (optional): Offset for pagination - defaults to 0
- `language` (optional): Language code (en, tc, sc) - defaults to "en"
Returns:
A dictionary containing:
- `count`: Total number of matching datasets
- `results`: List of matching datasets (up to limit)
- `search_facets`: Faceted information about the results
- `has_more`: Boolean indicating if there are more results available
### search_datasets_with_facets
Search for datasets and return faceted results for better data exploration.
This function is useful for exploring what types of data are available by showing counts of datasets grouped by tags, organizations, or other facets.
Parameters:
- `query` (optional): The solr query string - defaults to "_:_"
- `language` (optional): Language code (en, tc, sc) - defaults to "en"
Returns:
A dictionary containing:
- `count`: Total number of matching datasets
- `search_facets`: Faceted information about the results
- `sample_results`: First 3 matching datasets
### get_datasets_by_format
Get datasets that have resources in a specific file format.
Parameters:
- `file_format`: The file format to filter by (e.g., "CSV", "JSON", "GeoJSON")
- `limit` (optional): Maximum number of datasets to return - defaults to 10
- `language` (optional): Language code (en, tc, sc) - defaults to "en"
Returns:
A dictionary containing:
- `count`: Total number of matching datasets
- `results`: List of matching datasets
### get_supported_formats
Get a list of file formats supported by DATA.GOV.HK
Returns:
A list of supported file formats
## Local Testing
### Run test scripts:
```bash
python tests/test_client.py
python tests/debug_search.py
python tests/comprehensive_test.py
```
### Run server directly:
```bash
python -m src.mcp_open_data_hk
```
### Run unit tests:
```bash
pytest tests/
```
## Understanding Path Configuration
When installed as a package, the server can be referenced by its module name rather than file path. This is more convenient for users as they don't need to specify full file paths.
### Installed Package:
```json
{
"mcpServers": {
"mcp-open-data-hk": {
"command": "python",
"args": ["-m", "mcp_open_data_hk"]
}
}
}
```
### Local Development (file path approach):
```json
{
"mcpServers": {
"mcp-open-data-hk": {
"command": "python",
"args": ["-m", "src.mcp_open_data_hk"],
"cwd": "/full/path/to/mcp-open-data-hk"
}
}
}
```
The package installation approach is recommended for end users, while the file path approach is useful for local development and testing.
## Example Queries
Once installed, try these queries with your AI assistant:
1. "List some datasets from the Hong Kong government data portal via mcp-open-data-hk mcp."
2. "Find datasets related to transportation in Hong Kong. Use mcp-open-data-hk."
3. "What categories of data are available on DATA.GOV.HK? Use mcp-open-data-hk."
4. "Get details about the flight information dataset. Use mcp-open-data-hk."
5. "Search for datasets about weather in Hong Kong. Use mcp-open-data-hk."
6. "What file formats are supported by DATA.GOV.HK? Use mcp-open-data-hk."
7. "Find CSV datasets about population Use mcp-open-data-hk."
8. "Show me the most common tags in transport datasets Use mcp-open-data-hk."
The AI will automatically use the appropriate tools from your MCP server to fetch the requested information.
## Troubleshooting
### Common Issues
1. **Module not found errors**: Make sure you've installed the dependencies with `pip install -e .` for local development, or `pip install mcp-open-data-hk` for the published package.
2. **Path issues**: Ensure the `cwd` in your IDE configuration is the correct absolute path to the project root.
3. **Permission errors**: On Unix systems, make sure the scripts have execute permissions:
```bash
chmod +x src/mcp_open_data_hk/__main__.py
```
4. **FastMCP not found**: Install it with:
```bash
pip install fastmcp
```
### Testing the Connection
If you're having issues, you can test the connection manually:
1. Run the server in one terminal:
```bash
python -m src.mcp_open_data_hk
```
2. In another terminal, run the test client:
```bash
python tests/test_client.py
```
If this works, the issue is likely in the IDE configuration.
## Extending the Server
You can extend the server by adding more tools in `src/mcp_open_data_hk/server.py`. Follow the existing patterns:
1. Add a new function decorated with `@mcp.tool`
2. Provide a clear docstring explaining the function and parameters
3. Implement the functionality
4. Test with the client
The server automatically exposes all functions decorated with `@mcp.tool` to MCP clients.
## GitHub Workflows
This project includes GitHub Actions workflows for CI/CD:
1. **CI Workflow**: Runs tests across multiple Python versions (3.10-3.12) on every push/PR to main branch
2. **Publish Workflow**: Automatically builds and publishes to TestPyPI on every push to main, and to PyPI on version tags (v*.*.\*)
3. **Code Quality Workflow**: Checks code formatting and linting on every push/PR
4. **Release Workflow**: Automatically creates GitHub releases when tags are pushed
### Setup for Publishing (Trusted Publishing)
This project uses PyPI's Trusted Publishing which is more secure than using API tokens. To set it up:
1. Go to https://pypi.org/manage/account/publishing/ and add a new pending publisher with:
- Project name: `mcp-open-data-hk`
- Owner: Your GitHub username or organization
- Repository name: `mcp-open-data-hk`
- Workflow name: `publish.yml`
- Environment name: `pypi`
2. Go to https://test.pypi.org/manage/account/publishing/ and add a new pending publisher with the same information but use `testpypi` as the environment name.
3. In your GitHub repository, go to "Settings" > "Environments" and create two environments:
- `pypi` - Set "Required reviewers" to your username for security
- `testpypi` - No additional configuration needed
With Trusted Publishing, no API tokens need to be created or stored as secrets.
### GitHub Environments
For the Trusted Publishing to work correctly, you need to create two environments in your GitHub repository settings:
1. `pypi` - This environment requires manual approval for security when publishing to PyPI
2. `testpypi` - This environment doesn't require manual approval and will automatically publish to TestPyPI
To create these environments:
1. Go to your repository's "Settings" tab
2. Click on "Environments" in the left sidebar
3. Click "New environment"
4. Create the `pypi` environment and enable "Required reviewers" with your username
5. Create the `testpypi` environment with no additional settings
### Releasing New Versions
To release a new version:
1. Update the version number in `pyproject.toml`
2. Commit the changes
3. Create and push a new tag:
```bash
git tag -a v1.0.0 -m "Release version 1.0.0"
git push origin v1.0.0
```
Or use the provided release script:
```bash
./release.sh 1.0.0
```
This will automatically trigger the publish workflow to build and publish the package to TestPyPI and PyPI (for tagged releases), and create a GitHub release.
## Contributing
Contributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) and [Code of Conduct](CODE_OF_CONDUCT.md) for details on how to contribute to this project.
## Project Structure
```
mcp-open-data-hk/
├── src/
│ └── mcp_open_data_hk/ # Main Python package
│ ├── __init__.py # Package initialization
│ ├── __main__.py # Package entry point
│ └── server.py # Main MCP server implementation
├── tests/
│ ├── test_client.py # Client test script
│ ├── debug_search.py # Search functionality test
│ ├── comprehensive_test.py # Comprehensive functionality test
│ └── test_data_gov_hk.py # Unit tests
├── requirements.txt # Python dependencies
├── pyproject.toml # Project configuration
├── README.md # This file
├── run_examples.sh # Example commands script
├── install.sh # Installation helper script
├── release.sh # Release helper script
└── .gitignore # Git ignore file
```
## License
This project is licensed under the MIT License.