# Zotero MCP Server
A Model Context Protocol (MCP) server for Zotero that provides semantic search capabilities using PostgreSQL with pg-vector and OpenAI/Ollama embeddings.
__This is a fork of the [excellent zotero-mcp project](https://github.com/54yyyu/zotero-mcp) with modifications to match my personal workflow (pg-vector instead of chroma, ollama and openai backend instead of local transformers, etc.). I am still in progress of refactoring to fit this project to my personal needs__
__THIS IS NOT THE OFFICIAL PROJECT AND MY MODIFICATIONY MAY HAVE BUGS__. I just use this version for my personal research projects.
At the moment I use the version in this repository against my own OpenAI
compatible [API gateway](https://github.com/tspspi/mini-apigw).
## Features
- **Full Zotero Integration**: Access your Zotero library through MCP tools
- **Semantic Search**: AI-powered semantic search using PostgreSQL + pg-vector
- **Multiple Embedding Providers**: Support for OpenAI and Ollama embeddings
- **Lightweight Architecture**: Removed heavy ML dependencies (torch, transformers)
- **High Performance**: PostgreSQL backend with optimized vector operations
- **Flexible Configuration**: Support for local and remote database instances
## Quick Start
### Prerequisites
- Python 3.10+
- PostgreSQL 15+ with pg-vector extension
- Zotero desktop application or Zotero Web API credentials
- OpenAI API key or Ollama installation
### Installation
```
pip install -e .
```
### PostgreSQL Setup
If you have access to a PostgreSQL instance with pg-vector:
```sql
-- Connect to your PostgreSQL instance
CREATE DATABASE zotero_mcp;
CREATE USER zotero_user WITH PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE zotero_mcp TO zotero_user;
-- Enable pg-vector extension
\c zotero_mcp
CREATE EXTENSION vector;
```
### Configuration
Run the interactive setup:
```bash
zotero-mcp setup
```
### Usage with Claude Desktop
```json
{
"mcpServers": {
"zotero": {
"command": "/path/to/zotero-mcp",
"env": {
"ZOTERO_DB_HOST": "your_host",
"ZOTERO_DB_NAME": "zotero_mcp",
"ZOTERO_EMBEDDING_PROVIDER": "ollama",
"OLLAMA_HOST": "your_ollama_host"
}
}
}
}
```
## Configuration
### Database Configuration
Create `~/.config/zotero-mcp/config.json`:
```json
{
"database": {
"host": "localhost",
"port": 5432,
"database": "zotero_mcp",
"username": "zotero_user",
"password": "your_password",
"schema": "public",
"pool_size": 5
},
"embedding": {
"provider": "ollama",
"openai": {
"api_key": "sk-...",
"model": "text-embedding-3-small",
"batch_size": 100
},
"ollama": {
"host": "192.168.1.189:8182",
"model": "nomic-embed-text",
"timeout": 60
}
},
"chunking": {
"chunk_size": 1000,
"overlap": 100,
"min_chunk_size": 100,
"max_chunks_per_item": 10,
"chunking_strategy": "sentences"
},
"semantic_search": {
"similarity_threshold": 0.7,
"max_results": 50,
"update_config": {
"auto_update": false,
"update_frequency": "manual",
"batch_size": 50,
"parallel_workers": 4
}
}
}
```
## Available Tools
### Core Zotero Tools
- `zotero_search_items` - Search items by text query
- `zotero_search_by_tag` - Search items by tags
- `zotero_get_item_metadata` - Get item details and metadata
- `zotero_get_item_fulltext` - Extract full text from attachments
- `zotero_get_collections` - List all collections
- `zotero_get_collection_items` - Get items in a collection
- `zotero_get_recent` - Get recently added items
- `zotero_get_tags` - List all tags
- `zotero_batch_update_tags` - Bulk update tags
### Semantic Search Tools
- `zotero_semantic_search` - **AI-powered semantic search**
- `zotero_update_search_database` - Update embedding database
- `zotero_get_search_database_status` - Check database status
### Advanced Tools
- `zotero_get_annotations` - Extract annotations from PDFs
- `zotero_get_notes` - Retrieve notes
- `zotero_search_notes` - Search through notes
- `zotero_create_note` - Create new notes
- `zotero_advanced_search` - Complex multi-criteria search
## Semantic Search
The semantic search uses PostgreSQL with pg-vector for efficient vector similarity search:
### Database Population
```bash
# Initial database population
zotero-mcp update-db --force-rebuild
# Incremental updates
zotero-mcp update-db
# Update with limit (for testing)
zotero-mcp update-db --limit 100
# Check status
zotero-mcp status
```
## Embedding Providers
### OpenAI (Recommended)
```json
{
"embedding": {
"provider": "openai",
"openai": {
"api_key": "sk-...",
"model": "text-embedding-3-small",
"batch_size": 100,
"rate_limit_rpm": 3000
}
}
}
```
**Models Available**:
- `text-embedding-3-small` (1536 dimensions) - Fast and efficient
- `text-embedding-3-large` (3072 dimensions) - Higher quality
- `text-embedding-ada-002` (1536 dimensions) - Legacy model
### Ollama (Local)
```json
{
"embedding": {
"provider": "ollama",
"ollama": {
"host": "http://localhost:11434",
"model": "nomic-embed-text",
"timeout": 60
}
}
}
```
**Popular Models**:
- `nomic-embed-text` - Good general purpose embeddings
- `all-minilm` - Lightweight and fast
- `mxbai-embed-large` - High quality embeddings
To install Ollama models:
```bash
ollama pull nomic-embed-text
```
## Architecture
### Component Overview
```
┌─────────────────┐ ┌─────────────────┐
│ Claude MCP │───▶│ FastMCP Server │
│ Client │ │ (server.py) │
└─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Semantic Search │
│ (semantic_search.py) │
└─────────────────┘
│
┌──────────┴──────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Vector Client│ │ Embedding │
│(vector_client)│ │ Service │
└──────────────┘ │(embedding_ │
│ │ service.py) │
▼ └──────────────┘
┌──────────────┐ │
│ PostgreSQL │ ▼
│ + pgvector │ ┌──────────────┐
└──────────────┘ │ OpenAI/Ollama│
│ APIs │
└──────────────┘
```
### Database Schema
```sql
-- Core embeddings table
CREATE TABLE zotero_embeddings (
id SERIAL PRIMARY KEY,
item_key VARCHAR(50) UNIQUE NOT NULL,
item_type VARCHAR(50) NOT NULL,
title TEXT,
content TEXT NOT NULL,
content_hash VARCHAR(64) NOT NULL,
embedding vector(1536),
embedding_model VARCHAR(100) NOT NULL,
embedding_provider VARCHAR(50) NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Optimized indexes
CREATE INDEX idx_zotero_embedding_cosine
ON zotero_embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
CREATE INDEX idx_zotero_metadata_gin
ON zotero_embeddings USING gin(metadata);
```
## License
MIT License - see [LICENSE](LICENSE) file for details.