README.md•10.4 kB
# Pinecone Economic Books MCP Server
A Model Context Protocol (MCP) server that provides read-only access to a Pinecone vector database containing economic books and academic papers.
**Primary Feature:** Natural language semantic search powered by Pinecone's inference API - just ask in plain English and get relevant results automatically.
The server also provides specialized metadata search tools for precise filtering by author, subject, book, page range, and more.
## Features
### 10 Comprehensive Search Tools
**All search tools use semantic search powered by Pinecone inference API**
#### Primary Search Tools
1. **semantic_search** - Natural language search (DEFAULT/SIMPLEST)
- Automatically embeds your query using Pinecone inference
- Best for: "theories about market equilibrium", "impact of automation"
2. **semantic_search_with_filters** - Semantic search + metadata filters
- Combine natural language with precise filtering
- Best for: "labor productivity" in books by "Wassily Leontief"
#### Filtered Semantic Search
All tools below combine semantic search with metadata filtering:
3. **search_by_author** - Semantic search within a specific author's works
4. **search_by_subject** - Semantic search within content tagged with specific topics
5. **search_by_book** - Semantic search within a specific book
6. **search_by_page_range** - Semantic search within specific page ranges
7. **advanced_search** - Semantic search with multiple filters (author + book + subjects + pages)
#### Utility Tools
8. **get_by_id** - Retrieve a specific document by its ID
9. **get_index_stats** - Get statistics about the Pinecone index
10. **vector_search** - Search using pre-computed embedding vectors (advanced)
### Data Schema
Each document in the database contains:
```json
{
"id": "Author_BookName_PageNumber",
"score": 0.2712,
"metadata": {
"author_name": "Wassily Leontief",
"book_name": "Leontief_Essays in economics - theories and theorizing_1966",
"chapter_titles": ["Chapter Title"],
"chunk_text": "# Page 70\n...",
"pages": ["70", "71"],
"subjects": ["income", "national income", "output", "price"]
}
}
```
## Installation
### Prerequisites
- Python 3.10 or higher
- Pinecone account with an existing index
- MCP-compatible client (e.g., Claude Desktop, Claude Code)
### Setup
1. **Clone or create the project directory:**
```bash
mkdir pinecone-econ-mcp
cd pinecone-econ-mcp
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Configure environment variables:**
Copy `.env.example` to `.env` and fill in your credentials:
```bash
cp .env.example .env
```
Edit `.env`:
```env
PINECONE_API_KEY=your-pinecone-api-key-here
PINECONE_INDEX_NAME=economic-books
```
4. **Configure MCP client:**
Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
```json
{
"mcpServers": {
"pinecone-econ": {
"command": "/opt/homebrew/bin/python3.10",
"args": ["/absolute/path/to/pinecone-econ-mcp/server.py"]
}
}
}
```
Or for Claude Code (`~/.claude.json`):
```json
{
"mcpServers": {
"pinecone-econ": {
"command": "/opt/homebrew/bin/python3.10",
"args": ["/absolute/path/to/pinecone-econ-mcp/server.py"]
}
}
}
```
**Note:** This server requires Python 3.10+. If your python3.10 is in a different location, use `which python3.10` to find it.
## Usage Examples
### Semantic Search (Recommended - Default)
#### Basic Semantic Search
```python
# Find content about economic theories using natural language
semantic_search(
query="theories about market equilibrium and price discovery",
top_k=10
)
```
#### Semantic Search with Author Filter
```python
# Search for "labor productivity" concepts only in Leontief's work
semantic_search_with_filters(
query="labor productivity and input-output relationships",
author_name="Wassily Leontief",
top_k=5
)
```
#### Semantic Search with Multiple Filters
```python
# Find content about a topic in a specific book
semantic_search_with_filters(
query="national income and economic aggregates",
book_name="Leontief_Essays in economics - theories and theorizing_1966",
subjects=["income", "national income"],
top_k=10
)
```
### Filtered Semantic Search
All specialized search tools use semantic search combined with metadata filtering.
#### Search by Author
```python
# Search for economic concepts within Leontief's works
search_by_author(
query="input output analysis and economic modeling",
author_name="Wassily Leontief",
top_k=10
)
```
#### Search by Subject
```python
# Search for equilibrium concepts within content tagged "equilibrium"
search_by_subject(
query="price discovery and market clearing mechanisms",
subject="equilibrium",
top_k=15
)
```
#### Search by Book
```python
# Search for specific concepts within a book
search_by_book(
query="national income accounting methodologies",
book_name="Leontief_Essays in economics - theories and theorizing_1966",
top_k=20
)
```
#### Advanced Search
```python
# Semantic search with multiple metadata filters
advanced_search(
query="economic aggregates and measurement theory",
author_name="Wassily Leontief",
subjects=["income", "national income"],
pages=["70", "71", "72"],
top_k=10
)
```
#### Search by Page Range
```python
# Search within specific page ranges
search_by_page_range(
query="theoretical foundations of economics",
start_page="50",
end_page="60",
author_name="Wassily Leontief",
top_k=10
)
```
### Get Document by ID
```python
# Retrieve a specific document
get_by_id(
document_id="Wassily Leontief_Leontief_Essays in economics - theories and theorizing_1966_27"
)
```
### Vector Search
```python
# Search with a pre-computed embedding vector
vector_search(
vector=[0.1, 0.2, 0.3, ...], # Your embedding vector
top_k=5,
include_metadata=True
)
```
### Get Index Statistics
```python
# Get information about the index
get_index_stats()
```
## Tool Details
### Semantic Search by Default
**All search tools use semantic search** powered by Pinecone's integrated inference. Simply pass your text query and Pinecone automatically converts it to embeddings - no manual embedding calls needed. This provides seamless semantic search without the complexity of managing embedding models.
### Read-Only Operations
All tools are **read-only** - they only query and retrieve data from Pinecone. No write, update, or delete operations are exposed.
### Metadata Filtering
The server combines semantic search with Pinecone's metadata filtering capabilities using MongoDB-style query operators:
- `$eq` - Equals
- `$in` - In array
- `$and` - Logical AND
### Result Limits
- Default `top_k`: Varies by tool (5-10)
- Maximum `top_k`: 100 results per query
### Namespaces
All tools support optional `namespace` parameter for multi-tenant Pinecone indexes.
## Architecture
### Technology Stack
- **FastMCP**: Official Python SDK for MCP servers
- **Pinecone**: Vector database for semantic search
- **python-dotenv**: Environment variable management
### Components
- `server.py` - Main MCP server implementation
- `requirements.txt` - Python dependencies
- `.env` - Configuration (not committed)
- `.env.example` - Configuration template
## Development
### Project Structure
```
pinecone-econ-mcp/
├── server.py # MCP server implementation
├── requirements.txt # Python dependencies
├── .env # Environment variables (create from .env.example)
├── .env.example # Environment template
├── .gitignore # Git ignore rules
└── README.md # This file
```
### Adding New Tools
To add a new search tool:
1. Define a new function with the `@mcp.tool()` decorator
2. Add comprehensive docstring (used for tool schema)
3. Use type hints for all parameters
4. Return string-formatted results
5. Handle errors gracefully
Example:
```python
@mcp.tool()
def my_custom_search(
query_param: str,
top_k: int = 10
) -> str:
"""
Description of what this search does.
Args:
query_param: Description of parameter
top_k: Number of results
Returns:
Description of return value
"""
try:
# Implementation
results = index.query(...)
return str(format_result(results.matches))
except Exception as e:
return f"Error: {str(e)}"
```
## Troubleshooting
### Common Issues
1. **"API key not found"**
- Ensure `.env` file exists and contains valid `PINECONE_API_KEY`
- Check that `load_dotenv()` is being called
2. **"Index not found"**
- Verify `PINECONE_INDEX_NAME` matches your Pinecone index name
- Check Pinecone dashboard to confirm index exists
3. **"No results returned"**
- Verify data exists in your Pinecone index
- Check metadata field names match your data schema
- Try using `get_index_stats()` to verify index has vectors
4. **"Module not found"**
- Run `pip install -r requirements.txt`
- Ensure you're using Python 3.10+
### Debug Mode
Run the server directly to see debug output:
```bash
python server.py
```
## Security Notes
- **API Keys**: Never commit `.env` file to version control
- **Read-Only**: Server only performs read operations
- **No Authentication**: Add authentication if exposing externally
- **Rate Limiting**: Consider implementing rate limits for production use
## Resources
- [Model Context Protocol Docs](https://modelcontextprotocol.io)
- [Pinecone Documentation](https://docs.pinecone.io)
- [FastMCP GitHub](https://github.com/modelcontextprotocol/python-sdk)
## License
MIT License - feel free to modify and use as needed.
## Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Add tests if applicable
4. Submit a pull request
## Changelog
### v1.0.0 (2025-01-12)
- Initial release
- 10 comprehensive search tools, all using semantic search
- Semantic search via Pinecone integrated inference (pass text directly, no manual embedding)
- Simplified API - `index.search(query="text")` for all searches
- Read-only access to Pinecone economic books database
- Advanced metadata filtering combined with semantic search
- MCP server implementation with FastMCP