MCP Embedding Search
by spences10
Verified
# mcp-embedding-search
A Model Context Protocol (MCP) server that queries a Turso database
containing embeddings and transcript segments. This tool allows users
to search for relevant transcript segments by asking questions,
without generating new embeddings.
## Features
- 🔍 Vector similarity search for transcript segments
- 📊 Relevance scoring based on cosine similarity
- 📝 Complete transcript metadata (episode title, timestamps)
- ⚙️ Configurable search parameters (limit, minimum score)
- 🔄 Efficient database connection pooling
- 🛡️ Comprehensive error handling
- 📈 Performance optimized for quick responses
## Configuration
This server requires configuration through your MCP client. Here are
examples for different environments:
### Cline Configuration
Add this to your Cline MCP settings:
```json
{
"mcpServers": {
"mcp-embedding-search": {
"command": "node",
"args": ["/path/to/mcp-embedding-search/dist/index.js"],
"env": {
"TURSO_URL": "your-turso-database-url",
"TURSO_AUTH_TOKEN": "your-turso-auth-token"
}
}
}
}
```
### Claude Desktop Configuration
Add this to your Claude Desktop configuration:
```json
{
"mcpServers": {
"mcp-embedding-search": {
"command": "node",
"args": ["/path/to/mcp-embedding-search/dist/index.js"],
"env": {
"TURSO_URL": "your-turso-database-url",
"TURSO_AUTH_TOKEN": "your-turso-auth-token"
}
}
}
}
```
## API
The server implements one MCP tool:
### search_embeddings
Search for relevant transcript segments using vector similarity.
Parameters:
- `question` (string, required): The query text to search for
- `limit` (number, optional): Number of results to return (default: 5,
max: 50)
- `min_score` (number, optional): Minimum similarity threshold
(default: 0.5, range: 0-1)
Response format:
```json
[
{
"episode_title": "Episode Title",
"segment_text": "Transcript segment content...",
"start_time": 123.45,
"end_time": 167.89,
"similarity": 0.85
}
// Additional results...
]
```
## Database Schema
This tool expects a Turso database with the following schema:
```sql
CREATE TABLE embeddings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
transcript_id INTEGER NOT NULL,
embedding TEXT NOT NULL,
FOREIGN KEY(transcript_id) REFERENCES transcripts(id)
);
CREATE TABLE transcripts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
episode_title TEXT NOT NULL,
segment_text TEXT NOT NULL,
start_time REAL NOT NULL,
end_time REAL NOT NULL
);
```
The `embedding` column should contain vector embeddings that can be
used with the `vector_distance_cos` function.
## Development
### Setup
1. Clone the repository
2. Install dependencies:
```bash
npm install
```
3. Build the project:
```bash
npm run build
```
4. Run in development mode:
```bash
npm run dev
```
### Publishing
The project uses changesets for version management. To publish:
1. Create a changeset:
```bash
npm run changeset
```
2. Version the package:
```bash
npm run version
```
3. Publish to npm:
```bash
npm run release
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Built on the
[Model Context Protocol](https://github.com/modelcontextprotocol)
- Designed for efficient vector similarity search in transcript
databases