Skip to main content
Glama
functions_reference.md3.5 kB
# OpenDiscourse GovInfo MCP — Functions Reference for AI Agents Overview - This document details the primary callable functions AI agents should use for ingesting GovInfo bulkdata. - All functions are asynchronous entry points exposed in scripts/ingestion/__init__.py and implemented in scripts/ingestion/ingestor.py. Module: scripts.ingestion Functions 1) ingest_congress_data - Signature: ingest_congress_data(congress: int, doc_types: list[str] | None = None, output_dir: Path | None = None, workers: int = WORKERS) -> dict[str, int] - Description: Ingests all documents for a single congress across the specified document types. Handles directory traversal via XML/JSON endpoints, rate limiting, retries, and per-doc-type manifests/failures. - Parameters: - congress: Target congress number (e.g., 118) - doc_types: Subset of document types (default: all types from config) - output_dir: Base output directory; defaults to config.OUTPUT_DIR - workers: Parallelism level (semaphore and rate limiter) - Returns: - Mapping of doc_type to number of successfully downloaded files. - Side effects: - Writes files to {output_dir}/{congress}/{doc_type}/ - Writes manifest.json and failures.json per doc type - Exceptions: - Errors are logged; failed downloads recorded in failures.json. The function aims to return successfully processed counts. 2) ingest_all_congresses - Signature: ingest_all_congresses(congresses: list[int] | None = None, doc_types: list[str] | None = None, output_dir: Path | None = None, workers: int = WORKERS) -> dict[int, dict[str, int]] - Description: Orchestrates ingestion across multiple congresses and document types with the same robustness as ingest_congress_data. - Parameters: - congresses: List of congress numbers (default: config.CONGRESS_SESSIONS) - doc_types: Subset of document types (default: config.DOCUMENT_TYPES) - output_dir: Base output directory; defaults to config.OUTPUT_DIR - workers: Parallelism level - Returns: - Nested mapping of congress -> { doc_type -> success_count }. - Side effects: - Same as above for each congress/doc type combination. 3) GovInfoIngestor (class) - Purpose: Provides the core async ingestion implementation. AI agents typically won’t construct this directly unless they need fine-grained control. - Constructor parameters: - output_dir: Path - workers: int - timeout: int (seconds) - max_retries: int - rate_limit: int (requests/sec) - chunk_size: int (bytes) - validate_xml: bool - Key methods: - process_congress(session, congress, doc_types) - process_document_type(session, congress, doc_type) - get_document_list(session, congress, doc_type) - download_file(session, url, output_path, doc_type, retries) Usage patterns - Typical agents should call ingest_congress_data or ingest_all_congresses, not the low-level methods. - Ensure an asyncio event loop is available; use asyncio.run(...) for top-level calls. Environment variables (from scripts/ingestion/config.py) - GOVINFO_DATA_DIR — Output directory - GOVINFO_WORKERS — Parallel downloads - GOVINFO_RATE_LIMIT — Requests/sec - GOVINFO_VALIDATE_XML — Enable/disable validation - LOG_LEVEL — Logging verbosity Return values and artifacts - Success counts per doc type help agents decide if additional retries are needed. - Manifests and failures files provide a durable record of work and errors. Examples - See docs/agents/usage_guide.md for code snippets using these functions.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cbwinslow/opendiscourse_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server