Skip to main content
Glama

Fast Embedding MCP / SSE — Stable Static Embedding server

Serve RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2 over an OpenAI-compatible HTTP API and an MCP server (stdio).

The model is a ~16M-parameter English static embedding model: 512D native with Matryoshka (MRL) truncation to 256 / 128 / 64 / 32. It is fast (no attention) and tiny.

Install

This project uses uv for environment management. Install uv first if you don't have it (instructions):

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Then clone and sync. uv sync creates a .venv, installs the pinned dependencies from uv.lock, and installs the project itself:

git clone https://github.com/Rikka-Botan/Fast-Embedding-MCP-SSE.git
cd Fast-Embedding-MCP-SSE
uv sync

uv picks a compatible Python (3.10+) automatically — no manual venv or activation needed; prefix commands with uv run. The first server run downloads the model from Hugging Face (~60 MB) and caches it.

Related MCP server: PocketMCP

HTTP API

uv run python -m sse_embedding.api   # serves on http://0.0.0.0:8000
# or, equivalently:  uv run sse-api

Configurable via SSE_API_HOST / SSE_API_PORT.

Endpoints

Method

Path

Purpose

POST

/v1/embeddings

OpenAI-compatible embeddings (supports dimensions)

POST

/similarity

Cosine similarity matrix between two text sets

POST

/search

Rank documents against a query (stateless)

POST

/index/add

Add documents to the in-memory index

POST

/index/query

Query the in-memory index

GET

/index/stats

Index size

POST

/index/clear

Empty the index

GET

/health

Health check

OpenAI-compatible example

Works with the OpenAI SDK by pointing base_url at this server:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
resp = client.embeddings.create(
    model="RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2",
    input=["hello world", "good morning"],
    dimensions=256,          # MRL truncation: 512/256/128/64/32
)
print(len(resp.data[0].embedding))   # 256

Or raw:

curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": "hello world", "dimensions": 128}'

Search / index example

curl -X POST http://localhost:8000/index/add \
  -H "Content-Type: application/json" \
  -d '{"documents": ["The cat sat on the mat", "Paris is in France"]}'

curl -X POST http://localhost:8000/index/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Where is Paris?", "top_k": 1}'

MCP server (stdio)

uv run python -m sse_embedding.mcp_server
# or, equivalently:  uv run sse-mcp

Tools exposed: embed_text, similarity, search, index_add, index_query, index_stats, index_clear.

Register with Claude Code

Requires the Claude Code CLI. If claude is not a recognized command, you are likely using the Claude Desktop app — use the Claude Desktop config below instead.

Run from the cloned project directory:

claude mcp add sse-embedding -- uv run python -m sse_embedding.mcp_server

To make the registration work from any directory, pass the project path to uv with --directory:

claude mcp add sse-embedding -- uv run --directory /path/to/Fast-Embedding-MCP-SSE python -m sse_embedding.mcp_server

Register with Claude Desktop

Add to claude_desktop_config.json, replacing /path/to/... with the absolute path where you cloned this repository. uv run resolves the project's environment from the given directory.

macOS / Linux:

{
  "mcpServers": {
    "sse-embedding": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/Fast-Embedding-MCP-SSE", "python", "-m", "sse_embedding.mcp_server"]
    }
  }
}

Windows:

{
  "mcpServers": {
    "sse-embedding": {
      "command": "uv",
      "args": ["run", "--directory", "C:\\path\\to\\Fast-Embedding-MCP-SSE", "python", "-m", "sse_embedding.mcp_server"]
    }
  }
}

If Claude Desktop reports that uv was not found, replace "command": "uv" with the absolute path to the uv executable (which uv on macOS/Linux, (Get-Command uv).Source in PowerShell), or point command directly at the .venv interpreter that uv sync created (/path/to/Fast-Embedding-MCP-SSE/.venv/bin/python, or on Windows C:\\path\\to\\Fast-Embedding-MCP-SSE\\.venv\\Scripts\\python.exe) with "args": ["-m", "sse_embedding.mcp_server"].

Matryoshka dimensions

Valid dim / dimensions values are 512, 256, 128, 64, 32. Smaller dimensions are faster and smaller with graceful quality degradation. Truncation is applied to the full 512D vector and the result is renormalized, so cosine similarity stays valid at any level.

License

Apache-2.0

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Rikka-Botan/Fast-Embedding-MCP-SSE'

If you have feedback or need assistance with the MCP directory API, please join our Discord server