get_corpus_statistics
Analyze document collections by calculating statistical metrics from specified URN identifiers to understand corpus characteristics and patterns.
Instructions
Get statistical information about a corpus of documents.
Args: urns: List of URN identifiers for documents
Returns: JSON string containing corpus statistics
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urns | Yes |
Implementation Reference
- src/dhlab_mcp/server.py:274-293 (handler)The primary handler function for the 'get_corpus_statistics' tool. It is registered via the @mcp.tool() decorator. The function fetches metadata statistics for a list of URNs using the external dhlab library's get_metadata function and returns it as a JSON string.@mcp.tool() def get_corpus_statistics(urns: list[str]) -> str: """Get statistical information about a corpus of documents. Args: urns: List of URN identifiers for documents Returns: JSON string containing corpus statistics """ try: from dhlab.api.dhlab_api import get_metadata metadata = get_metadata(urns=urns) if metadata is not None and len(metadata) > 0: return metadata.to_json(orient='records', force_ascii=False) return "No metadata available" except Exception as e: return f"Error getting corpus statistics: {str(e)}"