# MCP Server
### 🧩 Main modules
1. **Main MCP server** - the core of the system, performing the following functions:
- Registry of connected servers
- Routing requests between servers
- Monitoring server status
- Aggregation of information about available tools
2. **Specialized servers** (connect to the main one):
- **Embedding server** - working with vector representations of text
- **PDF extract server** - conversion and extraction from PDF to Markdown
- **Reranker server** - ranking text data
- **Qdrant server** - managing vector collections
- **PostgreSQL server** - executing SQL queries and schema inspection
- **LLM server** - generating and streaming LLM responses, list of models
- **MarkUp server** - text/file markup using markup service methods
- **Transcribe server** - audio loading, status and transcription result
### ⚙️ Available tools
| Server | Methods |
|---|---|
| **Embedding server** | `embedding_generate`, `embedding_batch_generate`, `embedding_get_models`, `health_check` |
| **PDF extract server** | `document_convert_to_markdown`, `document_get_supported_formats`, `health_check` |
| **Reranker server** | `rerank_documents`, `health_check` |
| **Qdrant server** | `vector_create_collection`, `vector_get_collection_info`, `vector_upsert_points`, `vector_search`, `vector_delete_points`, `health_check` |
| **PostgreSQL server** | `postgres_execute_query`, `postgres_get_schema`, `postgres_create_table`, `postgres_insert_data`, `health_check` |
| **LLM server** | `llm_chat_completion`, `llm_get_models`, `llm_stream_completion`, `health_check` |
| **MarkUp server** | `markup_get_methods`, `markup_process_text`, `markup_process_file`, `health_check` |
| **Transcribe server** | `transcribe_audio`, `transcribe_get_status`, `transcribe_get_result`, `health_check` |
##### Note:
To add a new FastMCP server, you need to import it in the `main_server.py` file and place it in the `MCP_SERVERS` array, after which the main methods of `main_server.py` will have access to it.
Also a necessary requirement for FastMCP servers is the presence of the `health_check` method to check the state.
### 📡 Main server methods
```python
get_server_and_tools() # Get a list of all servers and tools
router(server_name, tool_name, params) # Routing requests
health_check_servers() # Checking the health of all servers
```
### Setting up the environment
Create a `.env` file in the root of the project and put the following environment variables in it (the list corresponds to the use in the code):
```
# Main server
MAIN_SERVER_API_KEY=...
# Embedding server
EMBEDDING_API_KEY=...
EMBEDDING_URL=...
EMBEDDING_MODEL_NAME=...
EMBEDDING_URL_MODELS=...
EMBEDDING_HEALTH_URL=...
# PDF extract server
PDF_EXTRACTOR_URL=...
PDF_HEALTH_URL=...
# Reranker server
RERANK_URL=...
RERANK_MODEL=...
RERANK_HEALTH_URL=...
# Qdrant server
QDRANT_URL=...
QDRANT_API_KEY=...
QDRANT_HEALTH_CHECK_URL=...
# PostgreSQL server
POSTGRES_USER=...
POSTGRES_PASSWORD=...
POSTGRES_HOST=...
POSTGRES_DB=...
#LLM server
LLM_SERVICE_API_KEY=...
LLM_SERVICE_MODEL=...
LLM_SERVICE_CHAT_COMPLETIONS_URL=...
LLM_SERVICE_MODELS_URL=...
LLM_SERVICE_COMPLETIONS_URL=...
LLM_SERVICE_HEALTH_URL=...
# MarkUp server
MARKUP_API_KEY=...
MARKUP_GET_METHODS_URL=...
MARKUP_PROCESS_TEXT_URL=...
MARKUP_PROCESS_FILE_URL=...
MARKUP_HEALTH_CHECK_URL=...
# Transcribe server
TRANSCRIBE_API_KEY=...
TRANSCRIBE_UPLOAD_AUDIO=...
TRANSCRIBE_HEALTH_URL=...
```
# Start the main server
### Installation dependencies and creating a virtual environment
Before starting the server, it is recommended to create a virtual environment and install all dependencies from `requirements.txt`. Run the following commands in the terminal:
1. Creating a virtual environment
```bash
python -m venv venv
```
2. Activating the environment
```bash
./venv/Scripts/activate
```
3. Installing dependencies
```bash
pip install -r requirements.txt
```
After setting up the environment, the server is started with the command
```bash
fastmcp run ./main_server.py:main_mcp_server --transport http
```
### Running in Docker
1. Build the image:
```bash
docker build -t mcp-main-server .
```
2. Run the container, passing `.env` as environment variables:
```bash
docker run --rm -p 8000:8000 --env-file .env mcp-main-server
```
### Configuring the server connection in Cursor
0. Run the server using the command above
1. Open the settings
2. Add the MCP server configuration:
```json
{
"mcpServers": {
"main-registry": {
"url": "http://localhost:8000/mcp/"
}}}
```
# Local MCP server: proxy_mcp_server
- **What is it**: proxy MCP server that connects to the main registry (`main_server`) and forwards its methods, and provides a high-level pipeline for pre-preparing data for RAG.
- **Where is it**: `proxy_mcp_server/proxy_mcp_server.py`
- **Available tools**:
- `get_server_and_tools()` — get a list of servers and their tools from the registry
- `router(server_name: str, tool_name: str, params: dict)` — universal call router
- `preprocessing_data_for_rag(file_paths: List[str]) -> str` — prepare PDF/texts and create a collection in Qdrant; returns the collection name
- `health_check_servers()` — check if all services are available
Requirements:
- `main_server` is running and accessible via URL (e.g. `http://localhost:8000/mcp/`).
- Valid API key `MAIN_SERVER_API_KEY` (must match `Authorization` header in `proxy_mcp_server.py`).
- Update `url` and `headers.Authorization` in `config` object inside `proxy_mcp_server/proxy_mcp_server.py` if necessary.
Connection in Cursor (example):
```json
{
"mcpServers": {
"proxy-server": {
"command": "uv",
"args": [
"run",
"fastmcp",
"run",
"YOUR_PATH_TO/proxy_mcp_server/proxy_mcp_server.py:proxy_mcp_server"]
}}}
```
Launch from terminal:
```
fastmcp run ./proxy_mcp_server/proxy_mcp_server.py:proxy_mcp_server
```
# RAG inference: interactive launch
- **What is this**: console assistant for asking questions to a collection of documents in Qdrant with additional ranking and generation of LLM response.
- **Location**: `rag_inference/RAG workflow.py`
- **Preliminary environment variables**: `QDRANT_URL`, `QDRANT_API_KEY`, `RERANK_URL`, `RERANK_MODEL`, `LLM_SERVICE_CHAT_COMPLETIONS_URL`, `LLM_SERVICE_API_KEY`, `LLM_SERVICE_MODEL`, `EMBEDDING_URL`, `EMBEDDING_MODEL` are used (described above in the settings section).
Run (Windows PowerShell):
```
python ".\rag_inference\RAG workflow.py" <collection_name>
```
Where `<collection_name>` is the name of the collection in Qdrant. It is convenient to get it in advance by calling the `preprocessing_data_for_rag` tool from `proxy_mcp_server` and passing a list of files to index; the method will return the name of the created collection.
Example:
```
python ".\rag_inference\RAG workflow.py" collection_for_rag_1
```