Knowledge Base MCP Server
by jeanibarz
# Knowledge Base MCP Server
This MCP server provides tools for listing and retrieving content from different knowledge bases.
<a href="https://glama.ai/mcp/servers/n0p6v0o0a4">
<img width="380" height="200" src="https://glama.ai/mcp/servers/n0p6v0o0a4/badge" alt="Knowledge Base Server MCP server" />
</a>
## Setup Instructions
These instructions assume you have Node.js and npm installed on your system.
**Prerequisites**
* [Node.js](https://nodejs.org/) (version 16 or higher)
* [npm](https://www.npmjs.com/) (Node Package Manager)
1. **Clone the repository:**
```bash
git clone <repository_url>
cd knowledge-base-mcp-server
```
2. **Install dependencies:**
```bash
npm install
```
3. **Configure environment variables:**
* The server requires the `HUGGINGFACE_API_KEY` environment variable to be set. This is the API key for the Hugging Face Inference API, which is used to generate embeddings for the knowledge base content. You can obtain a free API key from the Hugging Face website ([https://huggingface.co/](https://huggingface.co/)).
* The server requires the `KNOWLEDGE_BASES_ROOT_DIR` environment variable to be set. This variable specifies the directory where the knowledge base subdirectories are located. If you don't set this variable, it will default to `$HOME/knowledge_bases`, where `$HOME` is the current user's home directory.
* The server supports the `FAISS_INDEX_PATH` environment variable to specify the path to the FAISS index. If not set, it will default to `$HOME/knowledge_bases/.faiss`.
* The server supports the `HUGGINGFACE_MODEL_NAME` environment variable to specify the Hugging Face model to use for generating embeddings. If not set, it will default to `sentence-transformers/all-MiniLM-L6-v2`.
* You can set these environment variables in your `.bashrc` or `.zshrc` file, or directly in the MCP settings.
4. **Build the server:**
```bash
npm run build
```
5. **Add the server to the MCP settings:**
* Edit the `cline_mcp_settings.json` file located at `/home/jean/.vscode-server/data/User/globalStorage/saoudrizwan.claude-dev/settings/`.
* Add the following configuration to the `mcpServers` object:
```json
"knowledge-base-mcp": {
"command": "node",
"args": [
"/path/to/knowledge-base-mcp-server/build/index.js"
],
"disabled": false,
"autoApprove": [],
"env": {
"KNOWLEDGE_BASES_ROOT_DIR": "/path/to/knowledge_bases",
"HUGGINGFACE_API_KEY": "YOUR_HUGGINGFACE_API_KEY",
},
"description": "Retrieves similar chunks from the knowledge base based on a query."
},
```
* Replace `/path/to/knowledge-base-mcp-server` with the actual path to the server directory.
* Replace `/path/to/knowledge_bases` with the actual path to the knowledge bases directory.
6. **Create knowledge base directories:**
* Create subdirectories within the `KNOWLEDGE_BASES_ROOT_DIR` for each knowledge base (e.g., `company`, `it_support`, `onboarding`).
* Place text files (e.g., `.txt`, `.md`) containing the knowledge base content within these subdirectories.
* The server recursively reads all text files (e.g., `.txt`, `.md`) within the specified knowledge base subdirectories.
* The server skips hidden files and directories (those starting with a `.`).
* For each file, the server calculates the SHA256 hash and stores it in a file with the same name in a hidden `.index` subdirectory. This hash is used to determine if the file has been modified since the last indexing.
* The file content is splitted into chunks using the `MarkdownTextSplitter` from `langchain/text_splitter`.
* The content of each chunk is then added to a FAISS index, which is used for similarity search.
* The FAISS index is automatically initialized when the server starts. It checks for changes in the knowledge base files and updates the index accordingly.
## Usage
The server exposes two tools:
* `list_knowledge_bases`: Lists the available knowledge bases.
* `retrieve_knowledge`: Retrieves similar chunks from the knowledge base based on a query. Optionally, if a knowledge base is specified, only that one is searched; otherwise, all available knowledge bases are considered. By default, at most 10 document chunks are returned with a score below a threshold of 2. A different threshold can optionally be provided using the `threshold` parameter.
You can use these tools through the MCP interface.
The `retrieve_knowledge` tool performs a semantic search using a FAISS index. The index is automatically updated when the server starts or when a file in a knowledge base is modified.
The output of the `retrieve_knowledge` tool is a markdown formatted string with the following structure:
````markdown
## Semantic Search Results
**Result 1:**
[Content of the most similar chunk]
**Source:**
```json
{
"source": "[Path to the file containing the chunk]"
}
```
---
**Result 2:**
[Content of the second most similar chunk]
**Source:**
```json
{
"source": "[Path to the file containing the chunk]"
}
```
> **Disclaimer:** The provided results might not all be relevant. Please cross-check the relevance of the information.
````
Each result includes the content of the most similar chunk, the source file, and a similarity score.