Uses OpenAI's API with structured outputs to extract verifiable factual claims from text through a four-stage pipeline (sentence splitting, selection, disambiguation, and decomposition).
Claimify: Research-Based Claim Extraction via MCP
An implementation of the "Claimify" methodology for factual claim extraction, delivered as a local Model Context Protocol (MCP) server. This tool implements the multi-stage claim extraction approach detailed in the academic paper "Towards Effective Extraction and Evaluation of Factual Claims" by Metropolitansky & Larson (2025).
Promps from the paper have been modified for use with Structured Outputs. THIS IS NOT AN OFFICIAL IMPLEMENTATION.
Overview
Claimify extracts verifiable, decontextualized factual claims from text using a sophisticated four-stage pipeline:
Sentence Splitting: Breaks text into individual sentences with surrounding context
Selection: Filters for sentences containing verifiable propositions, excluding opinions and speculation
Disambiguation: Resolves ambiguities or discards sentences that cannot be clarified
Decomposition: Breaks down sentences into atomic, self-contained factual claims
The tool uses OpenAI's structured outputs feature exclusively for improved reliability and exposes its functionality through the Model Context Protocol, making it available to MCP-compatible clients like Cursor and Claude Desktop.
Features
Research-based methodology: Implements the peer-reviewed Claimify approach
Structured outputs: Uses OpenAI's structured outputs for reliable, type-safe responses
MCP integration: Seamlessly integrates with development environments
Robust parsing: Handles various text formats including lists and paragraphs
Context-aware: Uses surrounding sentences to resolve ambiguities
Multi-language support: Preserves original language while extracting claims
Resource storage: Automatically stores extracted claims as MCP resources for easy retrieval
Comprehensive logging: Detailed logging of all LLM calls, responses, and pipeline stages
Production-ready: Includes error handling, monitoring, and configuration management
Requirements
OpenAI API: Requires an OpenAI API key (if your MCP host does not support sampling (Github Copilot in vsCode does))
Compatible Model: Must use a model that supports structured outputs:
gpt-4o(recommended)gpt-4o-mini(faster and cheaper)
Python 3.10+: For proper type hints and Pydantic support
Quick Start
1. Installation
2. Configuration
Create a .env file in the project root:
Edit .env and add your API key:
MCP Client Configuration
For Cursor
Open Cursor and navigate to Settings > MCP
Click "Add a New Global MCP Server"
Add the following configuration to your MCP settings file (usually
~/.cursor/mcp.json):
Replace the paths with the absolute paths to your Python executable and server script
The "Claimify Extraction Server" should now appear as a connected tool in your MCP-enabled chat.
Usage Examples
Once configured, you can use the tool in your MCP client:
Using the Extract Claims Tool
Using the Prompts
The server exposes two prompts to help verify and document extracted claims:
1. Verify Single Claim (verify_claim)
Provides a pre-built prompt that instructs the LLM to verify a single factual claim against external sources.
Arguments:
claim_text(required): The decontextualized factual claim to check.
Behavior:
LLM is instructed to search authoritative sources (scholarly publications, reputable news outlets, official organizations).
Returns one of three statuses:
VERIFIED: Claim is clearly supported by reliable sources (provides at least 3 references with URLs + justification)
UNCERTAIN: Claim may be correct but lacks precision, has ambiguity, or has limited/conflicting evidence
DISPUTED: Claim is demonstrably false or contradicted by reliable sources
Sources must not be fabricated; preference for primary references.
Example retrieval (conceptual – actual call depends on client API):
Example expected LLM response format:
If uncertain:
If disputed:
2. Create Claims Report (create_claims_report)
Generates an initial CLAIMS.md file with all claims marked as TODO. Claims can then be verified incrementally, updating their status through the workflow: TODO → IN_PROGRESS → VERIFIED/UNCERTAIN/DISPUTED.
Workflow:
Initial Creation: All claims start with status TODO
During Verification: Update individual claims to IN_PROGRESS
After Verification: Update to VERIFIED, UNCERTAIN, or DISPUTED with evidence
Arguments:
None (attach the extraction resource to the context in VS Code)
Behavior:
Parses all claims from the extraction resource attached in the context
Creates CLAIMS.md with all claims marked as TODO
Provides a template structure ready for incremental verification
Example usage: When viewing an extraction resource in VS Code, attach it to the prompt context. The prompt will generate an initial CLAIMS.md file ready for verification.
Initial CLAIMS.md structure:
After verification updates:
Note: The server only supplies the prompts; external searching depends on the client/model capabilities.
Example 1: Simple Factual Text
Example 2: Mixed Fact and Opinion
(Note: The subjective content about being "incredibly innovative" and having "the best products" is filtered out)
Example 3: Multi-language Support
(Note: Content preserved in original Swedish, with contextual clarifications added in brackets)
Accessing Extracted Claims as Resources
Each extraction generates two kinds of resources:
Aggregate Extraction Resource (
claim://extraction_<n>_<timestamp>)Contains metadata (timestamp, preview, question) and the full list of claims
Returns JSON format
Individual Claim Resources (
claim://<slug>)Each claim is accessible via a unique slug (URL-safe identifier derived from claim text)
Returns plain text (the claim itself)
Aggregate Extraction JSON Example
URI: claim://apple-inc-was-founded-in-1976-by-steve-jobs
Individual Claim Resource Examples
URI pattern: claim://<slug>
Examples:
claim://apple-inc-was-founded-in-1976-by-steve-jobs-steve-wozniak-and-ronaldclaim://stockholm-is-the-capital-of-swedenclaim://python-was-first-publicly-released-in-1991
Content: Plain text of the claim (no JSON wrapper)
Benefits of per-claim resources:
Direct access: Retrieve any claim by its slug
Simple format: Plain text, no parsing needed
Unique identifiers: Each claim has a stable, readable URI
Easy citation: Link directly to individual claims
Project Structure
Architecture
The system follows a modular architecture with structured outputs:
MCP Server: Exposes the claim extraction as a tool via the Model Context Protocol
ClaimifyPipeline: Orchestrates the multi-stage extraction process using structured outputs
LLMClient: Handles communication with OpenAI API using structured outputs and Pydantic models
Structured Models: Pydantic models that define the expected response format for each stage
Stage Functions: Individual functions for Selection, Disambiguation, and Decomposition
Prompt Management: Simplified prompts optimized for structured outputs
Structured Outputs Benefits
The implementation uses OpenAI's structured outputs feature, which provides:
Type Safety: Responses are automatically validated against Pydantic models
Reliability: No more regex parsing failures or malformed JSON
Explicit Refusals: Safety-based refusals are programmatically detectable
Consistency: Guaranteed adherence to the expected response schema
Performance: Reduced need for retry logic and error handling
Configuration Options
Environment Variable | Description | Default | Options |
| Specific model to use |
| Models supporting structured outputs |
| OpenAI API key | None | Your API key |
| Enable detailed logging of all LLM interactions |
|
|
| Where to send log output |
|
|
| Log file name (used when LOG_OUTPUT=file) |
| Any filename |
Troubleshooting
Common Issues
"Model does not support structured outputs" error
Ensure you're using a compatible model:
gpt-4o-2024-08-06,gpt-4o-mini, orgpt-4oUpdate your
.envfile:LLM_MODEL=gpt-4o-2024-08-06
"API key not set" error
Ensure your
.envfile exists and contains the correct OpenAI API keyCheck that the key starts with
sk-
"NLTK punkt tokenizer not found"
Run:
python -c "import nltk; nltk.download('punkt_tab')"orpython -c "import nltk; nltk.download('punkt')"
MCP client can't connect
Check that the paths in your MCP configuration are absolute and correct
Ensure your Python virtual environment is activated
Verify the server script is executable:
chmod +x claimify_server.py
No claims extracted
Check the logs for detailed information about each pipeline stage
Ensure the input text contains verifiable factual statements
Try with simpler, more direct factual sentences first
Development
To extend or modify the system:
Adding new response fields: Update the Pydantic models in
structured_models.pyModifying prompts: Edit the prompts in
structured_prompts.pyAdding new stages: Create new functions in
pipeline.pyfollowing the existing patternTesting: Use the built-in logging to debug pipeline behavior
The structured outputs approach makes the system much more reliable and easier to debug compared to traditional text parsing methods.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
References
Metropolitansky & Larson (2025). "Towards Effective Extraction and Evaluation of Factual Claims"
Support
For issues related to:
Setup and configuration: Check this README and the troubleshooting section
MCP integration: Refer to the Model Context Protocol documentation
Research methodology: Consult the original Claimify paper