# LocalKB - Local Knowledge Base MCP Server
A standalone, offline knowledge base server implementing the Model Context Protocol (MCP). **LocalKB** enables AI assistants to search through **Wikipedia (static, large-scale knowledge)** and **your local files (dynamic, personal knowledge)** with **full citation support** - showing exactly where information comes from, when it was last modified, and which file contains it. Perfect for design work and technical documentation where source verification is critical.
[æ¥æ¬èªç README ã¯ãã¡ã](#æ¥æ¬èªç)
## Features
- **ð Multi-Source Search**: Search across Wikipedia AND your local files (Markdown, text) simultaneously
- **ð Citation Support**: Every search result includes source file path, last modified timestamp, and data source - verify information instantly
- **ð Hybrid Search**: Combines BM25 (keyword matching) + Vector embeddings (semantic similarity) for best results
- **â¡ Smart Indexing**: Wikipedia index cached permanently, local files scanned on startup for latest changes
- **ð Completely Offline**: No external API dependencies (Google Search, etc.)
- **ð° Free & Fast**: Uses efficient algorithms for both keyword and semantic search
- **ð MCP Compatible**: Works with any MCP-compatible client (Claude Desktop, etc.)
- **ð€ Ollama Integration**: Includes test client for Ollama-based agents
- **ð Easy Setup**: Simple installation with `uv` package manager
## Architecture
```
âââââââââââââââ ââââââââââââââââââââ âââââââââââââââ
â Ollama â âââââââ â MCP Client â âââââââ â Human â
â (LLM) â â (test script) â â â
âââââââââââââââ ââââââââââââââââââââ âââââââââââââââ
â
â MCP Protocol
âŒ
ââââââââââââââââââââ
â MCP Server â
â (src/server.py) â
ââââââââââââââââââââ
â
ââââââââââââââŽâââââââââââââ
⌠âŒ
ââââââââââââââââââââ ââââââââââââââââââââ
â Wikipedia Indexerâ â Local File â
â (Static/Cached) â â Indexer (Dynamic)â
ââââââââââââââââââââ ââââââââââââââââââââ
â â
⌠âŒ
ââââââââââââââââââââ ââââââââââââââââââââ
â BM25 + Vector DB â â BM25 + Vector DB â
â (1M+ articles) â â (Your files) â
ââââââââââââââââââââ ââââââââââââââââââââ
```
**Composite Pattern**: Results from both sources are merged using Reciprocal Rank Fusion (RRF) for optimal ranking.
## Installation
### Prerequisites
- Python 3.10 or higher
- [uv](https://github.com/astral-sh/uv) package manager
- (Optional) Ollama with a tool-compatible model (e.g., command-r) for testing
### Setup
1. Clone the repository:
```bash
git clone https://github.com/yourusername/localkb.git
cd localkb
```
2. Install dependencies:
```bash
uv sync
```
3. Build the Wikipedia index (first run only):
```bash
# Set smaller subset for testing (optional)
export WIKI_SUBSET_SIZE=10000 # Default: 1,000,000
uv run python -m src
# Press Ctrl+C after index is built
```
This will download English Wikipedia and create:
- **BM25 index** (keyword search) in `data/wiki_index.pkl`
- **Vector index** (semantic search) in `data/chroma_db/`
The initial build downloads documents and generates embeddings, which takes time. Default: 1M articles (~5GB). Full dataset: 6.8M articles (~20GB).
4. (Optional) Enable local file search:
```bash
# Set the path to your local documents
export LOCAL_DOCS_PATH="/path/to/your/notes" # e.g., ~/ObsidianVault/Research
```
This enables searching through your:
- Markdown files (`.md`)
- Text files (`.txt`)
- Any personal notes or documentation
The server will scan this directory on each startup to index the latest content.
## Usage
### Running the MCP Server
```bash
# Without local files
uv run python -m src
# With local files
LOCAL_DOCS_PATH="/path/to/your/notes" uv run python -m src
```
The server will:
1. Load the pre-built Wikipedia index (cached, fast)
2. Scan and index local files if `LOCAL_DOCS_PATH` is set (quick for typical document collections)
3. Start listening for MCP requests on stdio
4. Provide search tools: `search`, `search_wikipedia`, and `search_local`
### Running Local-Only (skip Wikipedia)
To run the MCP server that only indexes and serves your local documents (no Wikipedia), set `SKIP_WIKIPEDIA=true` and `LOCAL_DOCS_PATH` before starting the server.
Unix / macOS example:
```bash
export SKIP_WIKIPEDIA=true
export LOCAL_DOCS_PATH="/absolute/path/to/your/notes"
uv run python -m src
```
Windows PowerShell example (explicit absolute `uv` path shown):
```powershell
$env:SKIP_WIKIPEDIA = "true"
$env:LOCAL_DOCS_PATH = "C:\Users\you\Documents\Notes"
# Windows: call uv with an absolute path if it's installed in a virtualenv or not on PATH
&C:\Users\you\.venv\Scripts\uv.exe run python -m src
```
Windows Command Prompt example:
```cmd
set SKIP_WIKIPEDIA=true
set LOCAL_DOCS_PATH=C:\Users\you\Documents\Notes
C:\Users\you\AppData\Roaming\Python\Python310\Scripts\uv.exe run python -m src
```
Notes:
- `SKIP_WIKIPEDIA=true` prevents loading/building the Wikipedia index.
- `LOCAL_DOCS_PATH` should be an absolute path to your documents folder (Markdown/text files).
### Running the included tests with explicit LOCAL_DOCS_PATH
The test scripts spawn their own server subprocess and pass environment variables. You can also run the server directly and then run the client tests.
Run test client with explicit path (Unix/macOS):
```bash
LOCAL_DOCS_PATH="/absolute/path/to/test_docs" uv run python tests/verify_with_ollama.py --local
```
Windows PowerShell example (absolute `uv` path):
```powershell
$env:LOCAL_DOCS_PATH = "C:\absolute\path\to\test_docs"
&C:\path\to\uv.exe run python tests/verify_with_ollama.py --local
```
### Path isolation and index management
**Automatic Path Isolation** (v0.2+): Each `LOCAL_DOCS_PATH` now gets its own ChromaDB collection and state file, preventing data mixing when switching between different document directories. The server automatically:
- Generates a unique collection name based on the directory path hash
- Maintains separate indexing state for each path in `data/indexing_states/`
- Keeps vector embeddings isolated in path-specific ChromaDB collections
This means you can safely switch between different `LOCAL_DOCS_PATH` values without worrying about data contamination.
To verify path isolation is working, run the test suite:
```bash
uv run python tests/test_path_isolation.py
```
**Manual cleanup** (optional): If you want to completely reset all local indexes and start fresh:
Unix / macOS:
```bash
rm -rf data/local_chroma_db data/indexing_states
```
Windows PowerShell:
```powershell
Remove-Item -Recurse -Force data\local_chroma_db, data\indexing_states
```
Note: The old `data/indexing_state.json` file is no longer used (replaced by per-path state files in `data/indexing_states/`).
The server will:
1. Load the pre-built Wikipedia index (cached, fast)
2. Scan and index local files if `LOCAL_DOCS_PATH` is set (quick for typical document collections)
3. Start listening for MCP requests on stdio
4. Provide search tools: `search`, `search_wikipedia`, and `search_local`
### Testing with Ollama
#### Simple Test (Wikipedia Search, No LLM)
```bash
uv run tests/verify_with_ollama.py --simple
```
This tests the MCP connection and performs a direct Wikipedia search.
#### Local Document Search Test (No LLM)
```bash
uv run tests/verify_with_ollama.py --local
```
This tests the local file search capability with domain-specific queries. By default, it uses VisionSort/Casper KB documents as the test dataset.
Example output:
```
𧪠Running Local Document Search Test (VisionSort/Casper KB)...
ð Local docs path: /Users/ikmx/source/tc/Casper_KB-main
â
Available tools: ['search', 'search_wikipedia', 'search_local']
--- Test 1: VisionSort 405nmã¬ãŒã¶ãŒã®åºå ---
ð Query: VisionSort 405nm laser output power mW
ð Expected: 365 mW
â
PASS: Expected answer found in results!
--- Test 2: ãšã©ãŒã³ãŒã4015ã®æå³ãšå¯ŸåŠæ³ ---
ð Query: FluidicSystem error code 4015 CL Leak
ð Expected: Emergency level, chip holder leak
â
PASS: Related document found!
```
#### Full Agent Test (Requires Ollama)
```bash
# Make sure Ollama is running with a tool-compatible model
ollama pull llama3.2
ollama serve
# In another terminal:
uv run tests/verify_with_ollama.py
```
Expected output:
```
ð€ Starting MCP Client and connecting to Local Search Server...
â
Connected. Available tools: ['search', 'search_wikipedia', 'search_local']
ð€ User Query: Pythonãšããããã°ã©ãã³ã°èšèªã®æŽå²ã«ã€ããŠãç°¡æœã«æããŠ
ð ïž Agent requested 1 tool call(s)
â Tool: search_wikipedia
â Args: {'query': 'history of python programming language'}
â Output length: 1523 chars
ð€ Agent Answer:
Python was created by Guido van Rossum in the late 1980s...
```
### Integration with Claude Desktop
Add this to your Claude Desktop MCP configuration:
**Wikipedia only:**
```json
{
"mcpServers": {
"localkb": {
"command": "uv",
"args": ["run", "python", "-m", "src"],
"cwd": "/path/to/localkb"
}
}
}
```
**Wikipedia + Local Files:**
```json
{
"mcpServers": {
"localkb": {
"command": "uv",
"args": ["run", "python", "-m", "src"],
"cwd": "/path/to/localkb",
"env": {
"LOCAL_DOCS_PATH": "/Users/yourname/Documents/Notes"
}
}
}
}
```
Then restart Claude Desktop and you can search both Wikipedia and your personal files in conversations!
## Project Structure
```
localkb/
âââ pyproject.toml # Dependencies and project metadata
âââ README.md # This file
âââ .env.example # Environment variable configuration example
âââ data/ # Index storage (created on first run)
â âââ .gitkeep
â âââ wiki_index.pkl # Wikipedia BM25 index (cached)
â âââ chroma_db/ # Wikipedia vector index
â âââ local_chroma_db/ # Local files vector index
âââ src/
â âââ __init__.py
â âââ __main__.py # Entry point for `python -m src`
â âââ server.py # MCP server implementation
â âââ indexer.py # Multi-source hybrid indexing
â âââ loaders.py # Local file loaders
âââ test_docs/ # Test documents for CI/CD
â âââ document1.md # Sample documents
â âââ document2.md
â âââ ...
âââ test_notes/ # Additional sample test files
â âââ secret_project.md
â âââ meeting_notes.md
âââ tests/
â âââ __init__.py
â âââ README.md # Test documentation
â âââ test_indexing_search.py # CI/CD test suite (no LLM)
â âââ verify_with_ollama.py # LLM integration tests (local only)
âââ .github/
âââ workflows/
âââ test.yml # CI/CD test workflow
âââ lint.yml # Code quality checks
```
## Available Tools
### `search` (Multi-Source)
Search across Wikipedia AND your local files simultaneously using hybrid search.
**Parameters:**
- `query` (string, required): Search keywords or question
- `top_k` (integer, optional): Number of results to return per source (default: 5, max: 20)
- `strategy` (string, optional): Search strategy - `"hybrid"` (default), `"keyword"`, or `"semantic"`
- `source` (string, optional): Data source - `"all"` (default), `"wikipedia"`, or `"local"`
**Source Options:**
- **`"all"`** (default): Search both Wikipedia and local files for comprehensive results
- **`"wikipedia"`**: Search only Wikipedia (general knowledge)
- **`"local"`**: Search only your local files (personal knowledge)
**Search Strategies:**
- **`"hybrid"`** (recommended): Combines keyword matching and semantic similarity for best results
- **`"keyword"`**: Traditional BM25 keyword search (exact word matching, fast)
- **`"semantic"`**: Vector similarity search (finds conceptually similar content, even without exact words)
**Returns:**
Formatted search results with titles, URLs/paths, and content snippets. Results from both sources are merged intelligently using Reciprocal Rank Fusion (RRF).
### `search_wikipedia`
Search English Wikipedia only using hybrid search (BM25 + Vector embeddings). Convenience wrapper for `search` with `source="wikipedia"`.
**Parameters:**
- `query` (string, required): Search keywords or question
- `top_k` (integer, optional): Number of results to return (default: 3, max: 10)
- `strategy` (string, optional): Search strategy - `"hybrid"` (default), `"keyword"`, or `"semantic"`
### `search_local`
Search your local files only using hybrid search. Convenience wrapper for `search` with `source="local"`.
**Parameters:**
- `query` (string, required): Search keywords or question
- `top_k` (integer, optional): Number of results to return (default: 5, max: 20)
- `strategy` (string, optional): Search strategy - `"hybrid"` (default), `"keyword"`, or `"semantic"`
**Examples:**
```python
# Hybrid search (best results, default)
result = await session.call_tool(
"search_wikipedia",
arguments={"query": "python programming language", "top_k": 3}
)
# Keyword-only search (fast, exact matches)
result = await session.call_tool(
"search_wikipedia",
arguments={"query": "python programming language", "strategy": "keyword"}
)
# Semantic search (finds similar concepts)
result = await session.call_tool(
"search_wikipedia",
arguments={"query": "snake that inspired a programming language", "strategy": "semantic"}
)
```
## Customization
### Using Simple English Wikipedia (for development)
For faster development/testing, use the lightweight Simple English Wikipedia:
Edit `src/indexer.py`:
```python
# Change this line:
ds = load_dataset("wikimedia/wikipedia", "20231101.en", split="train")
# To (Simple English, limited to 10k articles):
ds = load_dataset("wikimedia/wikipedia", "20231101.simple", split="train[:10000]")
```
This reduces disk space to ~500MB and builds in a few minutes.
### Adjusting Index Size
You can limit the number of articles for testing:
```python
# Limit to 1000 articles
ds = load_dataset("wikimedia/wikipedia", "20231101.en", split="train[:1000]")
```
## Development
### Running Tests
This project has two types of tests:
#### 1. CI/CD Tests (Automated)
These tests run automatically in GitHub Actions and require no LLM:
```bash
# Run the full CI/CD test suite (with local files only, fast)
SKIP_WIKIPEDIA=true uv run python tests/test_indexing_search.py
# Run with Wikipedia indexing (requires ~500MB disk space and internet)
uv run python tests/test_indexing_search.py
```
**What's tested:**
- MCP server connection
- Local document indexing
- Search results quality
- Incremental indexing (mtime-based change detection)
- Search strategies (keyword vs hybrid)
These tests use the `test_docs/` directory containing sample documents in the repository.
#### 2. LLM Integration Tests (Local Only)
These tests require Ollama and are for local development only:
```bash
# Simple MCP connection test (Wikipedia search, no LLM)
uv run python tests/verify_with_ollama.py --simple
# Local document search test (no LLM)
uv run python tests/verify_with_ollama.py --local
# Q&A test with Ollama (requires llama3.2)
uv run python tests/verify_with_ollama.py --local-qa
# Full agent test with function calling (requires llama3.2 and command-r)
uv run python tests/verify_with_ollama.py
```
**Requirements:**
- Ollama installed and running
- Models: `llama3.2`, `command-r` (install with `ollama pull <model>`)
### Test Options
| Test File | Type | LLM Required | Purpose |
|-----------|------|--------------|---------|
| `test_indexing_search.py` | CI/CD | No | Automated testing of core functionality |
| `verify_with_ollama.py --simple` | Manual | No | Basic connection test |
| `verify_with_ollama.py --local` | Manual | No | Local search validation |
| `verify_with_ollama.py --local-qa` | Manual | Yes | Q&A with local docs |
| `verify_with_ollama.py` | Manual | Yes | Full agent workflow |
See `tests/README.md` for detailed test documentation.
### Customizing Local Document Path
Set the `LOCAL_DOCS_PATH` environment variable to use your own documents:
```bash
export LOCAL_DOCS_PATH="/path/to/your/documents"
uv run python tests/test_indexing_search.py
```
### Rebuilding Index
Delete `data/wiki_index.pkl` and restart the server.
## Troubleshooting
### Server Initialization Timeout
If VS Code shows "Waiting for server to respond to `initialize` request" and times out:
**Solution 1: Skip Wikipedia for faster startup**
```bash
# In your .env file or environment variables
SKIP_WIKIPEDIA=true
```
**Solution 2: Check server status**
The server initializes indices in the background. Use the status resource to check progress:
```bash
# Check if indices are loaded
echo 'config://status' | python src/server.py
```
**Solution 3: Reduce Wikipedia dataset size**
```bash
# In your .env file - use smaller dataset for testing
WIKI_SUBSET_SIZE=10000
```
The server now initializes asynchronously - MCP will respond immediately, and indices load in the background.
### Index Not Building
- Check disk space (needs ~500MB for Simple Wikipedia, ~20GB for full)
- Ensure stable internet connection for initial download
- Check Python version (3.10+ required)
### Search Returns "Initializing" Message
This is normal on first startup. The server is loading indices in the background. Wait 30-60 seconds and try again.
### Ollama Connection Fails
- Verify Ollama is running: `ollama list`
- Ensure a tool-compatible model is installed: `ollama pull command-r`
- Check Ollama API is accessible: `curl http://localhost:11434`
### MCP Server Not Starting
- Check dependencies: `uv sync`
- Verify Python path in MCP config
- Check for port conflicts
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT License - see LICENSE file for details
---
# æ¥æ¬èªç
## æŠèŠ
**LocalKBïŒããŒã«ã«ç¥èããŒã¹ïŒ**ã¯ãããŒã«ã«ç°å¢ã§åäœããMCPãµãŒããŒã§ãã**WikipediaïŒéçã§å€§èŠæš¡ãªç¥èïŒ**ãš**ããŒã«ã«ãã¡ã€ã«ïŒåçã§å人çãªç¥èïŒ**ã®äž¡æ¹ãæ€çŽ¢ã§ãã**å®å
šãªåŒçšãµããŒãä»ã**ã§æ
å ±ã®åºå
žãå³åº§ã«ç¢ºèªã§ããŸããèšèšæ¥åãæè¡ææžç®¡çã§ããã®æ
å ±ã¯ã©ã®ãã¡ã€ã«ããïŒãã€æŽæ°ãããïŒããæç¢ºã«ããŸãã
## ç¹åŸŽ
- **ð ãã«ããœãŒã¹æ€çŽ¢**: Wikipedia ãšããŒã«ã«ãã¡ã€ã«ïŒMarkdownãããã¹ãïŒãåæã«æ€çŽ¢å¯èœ
- **ð åŒçšãµããŒã**: æ€çŽ¢çµæã«ããã¡ã€ã«ãã¹ããæçµæŽæ°æ¥æããããŒã¿ãœãŒã¹ããæèš - æ
å ±ã®åºå
žãå³åº§ã«ç¢ºèª
- **ð ãã€ããªããæ€çŽ¢**: BM25ïŒããŒã¯ãŒãæ€çŽ¢ïŒ+ ãã¯ãã«åã蟌ã¿ïŒæå³æ€çŽ¢ïŒã®çµã¿åããã§æé«ã®çµæãæäŸ
- **â¡ ã¹ããŒãã€ã³ããã¯ã¹**: Wikipedia ã¯æ°žç¶ãã£ãã·ã¥ãããŒã«ã«ãã¡ã€ã«ã¯èµ·åæã«ã¹ãã£ã³ããŠææ°ç¶æ
ãåæ
- **ð å®å
šãªãã©ã€ã³**: ã€ã³ã¿ãŒãããæ¥ç¶äžèŠ
- **ð° ç¡æã»é«é**: ããŒã¯ãŒããšæå³ã®äž¡æ¹ã«å¯Ÿå¿ããå¹ççãªæ€çŽ¢ã¢ã«ãŽãªãºã
- **ð MCP äºæ**: Claude Desktop ãªã©ã® MCP 察å¿ã¯ã©ã€ã¢ã³ãã§äœ¿çšå¯èœ
- **ð€ Ollama çµ±å**: Ollama ã䜿ã£ããã¹ãã¯ã©ã€ã¢ã³ãä»å±
- **ð ç°¡åã»ããã¢ãã**: `uv` ã«ããç°¡åã€ã³ã¹ããŒã«
## ã¢ãŒããã¯ãã£
```
âââââââââââââââ ââââââââââââââââââââ âââââââââââââââ
â Ollama â âââââââ â MCP Client â âââââââ â Human â
â (LLM) â â (test script) â â â
âââââââââââââââ ââââââââââââââââââââ âââââââââââââââ
â
â MCP Protocol
âŒ
ââââââââââââââââââââ
â MCP Server â
â (src/server.py) â
ââââââââââââââââââââ
â
ââââââââââââââŽâââââââââââââ
⌠âŒ
ââââââââââââââââââââ ââââââââââââââââââââ
â Wikipedia Indexerâ â Local File â
â (éç/ãã£ãã·ã¥)â â Indexer (åç) â
ââââââââââââââââââââ ââââââââââââââââââââ
â â
⌠âŒ
ââââââââââââââââââââ ââââââââââââââââââââ
â BM25 + Vector DB â â BM25 + Vector DB â
â (100äžä»¶ä»¥äž) â â (ããªãã®ãã¡ã€ã«)â
ââââââââââââââââââââ ââââââââââââââââââââ
```
**Composite Pattern**: äž¡æ¹ã®ãœãŒã¹ããã®çµæã Reciprocal Rank Fusion (RRF) ã§ããŒãžããŠæé©ãªã©ã³ãã³ã°ãå®çŸ
## ã€ã³ã¹ããŒã«
### å¿
èŠèŠä»¶
- Python 3.10 以äž
- [uv](https://github.com/astral-sh/uv) ããã±ãŒãžãããŒãžã£ãŒ
- (ãªãã·ã§ã³) ãã¹ãçšã® Ollama ãšããŒã«å¯Ÿå¿ã¢ãã«ïŒäŸ: command-rïŒ
### ã»ããã¢ããæé
1. ãªããžããªãã¯ããŒã³:
```bash
git clone https://github.com/yourusername/localkb.git
cd localkb
```
2. äŸåé¢ä¿ãã€ã³ã¹ããŒã«:
```bash
uv sync
```
3. Wikipedia ã€ã³ããã¯ã¹ãæ§ç¯ïŒååã®ã¿ïŒ:
```bash
# ãã¹ãçšã«å°ãããµãã»ããã䜿çšïŒãªãã·ã§ã³ïŒ
export WIKI_SUBSET_SIZE=10000 # ããã©ã«ã: 1,000,000
uv run python -m src
# ã€ã³ããã¯ã¹æ§ç¯åŸ Ctrl+C ã§çµäº
```
ããã«ãã以äžãäœæãããŸãïŒ
- **BM25 ã€ã³ããã¯ã¹**ïŒããŒã¯ãŒãæ€çŽ¢ïŒ: `data/wiki_index.pkl`
- **ãã¯ãã«ã€ã³ããã¯ã¹**ïŒæå³æ€çŽ¢ïŒ: `data/chroma_db/`
ååæ§ç¯ã¯ããã¥ã¡ã³ãã®ããŠã³ããŒããšåã蟌ã¿çæãè¡ãããæéãããããŸããããã©ã«ã: 100äžèšäºïŒçŽ5GBïŒãå®å
šç: 680äžèšäºïŒçŽ20GBïŒã
4. (ãªãã·ã§ã³) ããŒã«ã«ãã¡ã€ã«æ€çŽ¢ãæå¹å:
```bash
# ããŒã«ã«ããã¥ã¡ã³ãã®ãã¹ãèšå®
export LOCAL_DOCS_PATH="/path/to/your/notes" # äŸ: ~/ObsidianVault/Research
```
ããã«ãã以äžã®ãã¡ã€ã«ãæ€çŽ¢ã§ããããã«ãªããŸãïŒ
- Markdown ãã¡ã€ã« (`.md`)
- ããã¹ããã¡ã€ã« (`.txt`)
- å人çãªããŒããããã¥ã¡ã³ã
ãµãŒããŒã¯èµ·åæã«ãã®ãã£ã¬ã¯ããªãã¹ãã£ã³ããŠææ°ã®å
容ãã€ã³ããã¯ã¹åããŸãã
## äœ¿ãæ¹
### MCP ãµãŒããŒã®èµ·å
```bash
# ããŒã«ã«ãã¡ã€ã«ãªãã§èµ·å
uv run python -m src
# ããŒã«ã«ãã¡ã€ã«ããã§èµ·å
LOCAL_DOCS_PATH="/path/to/your/notes" uv run python -m src
```
ãµãŒããŒã¯ä»¥äžãå®è¡ããŸãïŒ
1. æ§ç¯æžã¿ Wikipedia ã€ã³ããã¯ã¹ãèªã¿èŸŒã¿ïŒãã£ãã·ã¥ããé«éèªã¿èŸŒã¿ïŒ
2. `LOCAL_DOCS_PATH` ãèšå®ãããŠããå ŽåãããŒã«ã«ãã¡ã€ã«ãã¹ãã£ã³ããŠã€ã³ããã¯ã¹åïŒéåžžã¯æ°ç§ïŒ
3. æšæºå
¥åºåã§ MCP ãªã¯ãšã¹ããåŸ
æ©
4. æ€çŽ¢ããŒã«ãæäŸ: `search`ã`search_wikipedia`ã`search_local`
### Ollama ã䜿ã£ããã¹ã
#### ã·ã³ãã«ãã¹ãïŒWikipedia æ€çŽ¢ãLLM ãªãïŒ
```bash
uv run tests/verify_with_ollama.py --simple
```
MCP æ¥ç¶ãš Wikipedia æ€çŽ¢æ©èœããã¹ãããŸãã
#### ããŒã«ã«ããã¥ã¡ã³ãæ€çŽ¢ãã¹ãïŒLLM ãªãïŒ
```bash
uv run tests/verify_with_ollama.py --local
```
ããŒã«ã«ãã¡ã€ã«æ€çŽ¢æ©èœããã¡ã€ã³åºæã®ã¯ãšãªã§ãã¹ãããŸããããã©ã«ãã§ã¯ VisionSort/Casper KB ããã¥ã¡ã³ãããã¹ãããŒã¿ã»ãããšããŠäœ¿çšããŸãã
åºåäŸ:
```
𧪠Running Local Document Search Test (VisionSort/Casper KB)...
ð Local docs path: /Users/ikmx/source/tc/Casper_KB-main
â
Available tools: ['search', 'search_wikipedia', 'search_local']
--- Test 1: VisionSort 405nmã¬ãŒã¶ãŒã®åºå ---
ð Query: VisionSort 405nm laser output power mW
ð Expected: 365 mW
â
PASS: Expected answer found in results!
```
#### ãšãŒãžã§ã³ããã¹ãïŒOllama å¿
èŠïŒ
```bash
# Ollama ãš llama3.2 ã¢ãã«ãèµ·å
ollama pull llama3.2
ollama serve
# å¥ã®ã¿ãŒããã«ã§å®è¡:
uv run tests/verify_with_ollama.py
```
### Claude Desktop ãšã®çµ±å
Claude Desktop ã® MCP èšå®ã«ä»¥äžã远å :
**Wikipedia ã®ã¿:**
```json
{
"mcpServers": {
"localkb": {
"command": "uv",
"args": ["run", "python", "-m", "src"],
"cwd": "/path/to/localkb"
}
}
}
```
**Wikipedia + ããŒã«ã«ãã¡ã€ã«:**
```json
{
"mcpServers": {
"localkb": {
"command": "uv",
"args": ["run", "python", "-m", "src"],
"cwd": "/path/to/localkb",
"env": {
"LOCAL_DOCS_PATH": "/Users/yourname/Documents/Notes"
}
}
}
}
```
Claude Desktop ãåèµ·åãããšãäŒè©±å
ã§ Wikipedia ãšå人ãã¡ã€ã«ã®äž¡æ¹ãæ€çŽ¢ã§ããããã«ãªããŸãïŒ
## å©çšå¯èœãªããŒã«
### `search` (ãã«ããœãŒã¹)
Wikipedia ãšããŒã«ã«ãã¡ã€ã«ã®äž¡æ¹ããã€ããªããæ€çŽ¢ã§åæã«æ€çŽ¢ããŸãã
**ãã©ã¡ãŒã¿:**
- `query` (æåå, å¿
é ): æ€çŽ¢ããŒã¯ãŒããŸãã¯è³ªå
- `top_k` (æŽæ°, ãªãã·ã§ã³): ãœãŒã¹ããšã«è¿ãçµæã®æ°ïŒããã©ã«ã: 5ãæå€§: 20ïŒ
- `strategy` (æåå, ãªãã·ã§ã³): æ€çŽ¢æŠç¥ - `"hybrid"` (ããã©ã«ã)ã`"keyword"`ããŸã㯠`"semantic"`
- `source` (æåå, ãªãã·ã§ã³): ããŒã¿ãœãŒã¹ - `"all"` (ããã©ã«ã)ã`"wikipedia"`ããŸã㯠`"local"`
**ãœãŒã¹ãªãã·ã§ã³:**
- **`"all"`** (ããã©ã«ã): Wikipedia ãšããŒã«ã«ãã¡ã€ã«ã®äž¡æ¹ãæ€çŽ¢ããŠå
æ¬çãªçµæãååŸ
- **`"wikipedia"`**: Wikipedia ã®ã¿æ€çŽ¢ïŒäžè¬ç¥èïŒ
- **`"local"`**: ããŒã«ã«ãã¡ã€ã«ã®ã¿æ€çŽ¢ïŒå人ç¥èïŒ
**æ€çŽ¢æŠç¥:**
- **`"hybrid"`** (æšå¥š): ããŒã¯ãŒãæ€çŽ¢ãšæå³æ€çŽ¢ãçµã¿åãããŠæè¯ã®çµæãæäŸ
- **`"keyword"`**: åŸæ¥ã® BM25 ããŒã¯ãŒãæ€çŽ¢ïŒå®å
šäžèŽãé«éïŒ
- **`"semantic"`**: ãã¯ãã«é¡äŒŒåºŠæ€çŽ¢ïŒåèªãäžèŽããªããŠãæŠå¿µçã«é¡äŒŒããã³ã³ãã³ããæ€çŽ¢ïŒ
**æ»ãå€:**
ã¿ã€ãã«ãURL/ãã¹ãæ¬æã¹ãããããå«ãæ€çŽ¢çµæãäž¡æ¹ã®ãœãŒã¹ããã®çµæã¯ Reciprocal Rank Fusion (RRF) ã§ã€ã³ããªãžã§ã³ãã«ããŒãžãããŸãã
### `search_wikipedia`
Wikipedia ã®ã¿ããã€ããªããæ€çŽ¢ïŒBM25 + ãã¯ãã«åã蟌ã¿ïŒã§æ€çŽ¢ããŸãã`search` ã® `source="wikipedia"` ã®ã©ãããŒã§ãã
**ãã©ã¡ãŒã¿:**
- `query` (æåå, å¿
é ): æ€çŽ¢ããŒã¯ãŒããŸãã¯è³ªå
- `top_k` (æŽæ°, ãªãã·ã§ã³): è¿ãçµæã®æ°ïŒããã©ã«ã: 3ãæå€§: 10ïŒ
- `strategy` (æåå, ãªãã·ã§ã³): æ€çŽ¢æŠç¥ - `"hybrid"` (ããã©ã«ã)ã`"keyword"`ããŸã㯠`"semantic"`
### `search_local`
ããŒã«ã«ãã¡ã€ã«ã®ã¿ããã€ããªããæ€çŽ¢ã§æ€çŽ¢ããŸãã`search` ã® `source="local"` ã®ã©ãããŒã§ãã
**ãã©ã¡ãŒã¿:**
- `query` (æåå, å¿
é ): æ€çŽ¢ããŒã¯ãŒããŸãã¯è³ªå
- `top_k` (æŽæ°, ãªãã·ã§ã³): è¿ãçµæã®æ°ïŒããã©ã«ã: 5ãæå€§: 20ïŒ
- `strategy` (æåå, ãªãã·ã§ã³): æ€çŽ¢æŠç¥ - `"hybrid"` (ããã©ã«ã)ã`"keyword"`ããŸã㯠`"semantic"`
## ã«ã¹ã¿ãã€ãº
### å®å
šç Wikipedia ã䜿çš
`src/indexer.py` ãç·šé:
```python
# ãã®è¡ã倿Ž:
ds = load_dataset("wikipedia", "20220301.simple", split="train[:10000]")
# 以äžã«å€æŽ:
ds = load_dataset("wikipedia", "20231101.en", split="train")
```
泚: çŽ20GB ã®ãã£ã¹ã¯ã¹ããŒã¹ãšé·ãæ§ç¯æéãå¿
èŠã§ãã
## éçº
### ãã¹ãã®å®è¡
ãã®ãããžã§ã¯ãã«ã¯2çš®é¡ã®ãã¹ãããããŸãïŒ
#### 1. CI/CD ãã¹ãïŒèªååïŒ
GitHub Actions ã§èªåçã«å®è¡ãããLLM ã¯äžèŠã§ãïŒ
```bash
# CI/CD ãã¹ãã¹ã€ãŒãã®å®è¡ïŒããŒã«ã«ãã¡ã€ã«ã®ã¿ãé«éïŒ
SKIP_WIKIPEDIA=true uv run python tests/test_indexing_search.py
# Wikipedia ã€ã³ããã¯ã¹ããã§å®è¡ïŒçŽ500MBã®ãã£ã¹ã¯å®¹éãšã€ã³ã¿ãŒãããæ¥ç¶ãå¿
èŠïŒ
uv run python tests/test_indexing_search.py
```
**ãã¹ãå
容:**
- MCP ãµãŒããŒæ¥ç¶
- ããŒã«ã«ããã¥ã¡ã³ãã®ã€ã³ããã¯ã¹å
- æ€çŽ¢çµæã®å質
- å¢åã€ã³ããã¯ã¹ïŒmtime ããŒã¹ã®å€æŽæ€åºïŒ
- æ€çŽ¢æŠç¥ïŒããŒã¯ãŒã vs ãã€ããªããïŒ
ãããã®ãã¹ãã¯ãªããžããªå
ã® `test_docs/` ãã£ã¬ã¯ããªã®ãµã³ãã«ããã¥ã¡ã³ãã䜿çšããŸãã
#### 2. LLM çµ±åãã¹ãïŒããŒã«ã«ã®ã¿ïŒ
Ollama ãå¿
èŠã§ãããŒã«ã«éçºå°çšã§ãïŒ
```bash
# ã·ã³ãã«ãª MCP æ¥ç¶ãã¹ãïŒWikipedia æ€çŽ¢ãLLM ãªãïŒ
uv run python tests/verify_with_ollama.py --simple
# ããŒã«ã«ããã¥ã¡ã³ãæ€çŽ¢ãã¹ãïŒLLM ãªãïŒ
uv run python tests/verify_with_ollama.py --local
# Ollama ã䜿ã£ã Q&A ãã¹ãïŒllama3.2 ãå¿
èŠïŒ
uv run python tests/verify_with_ollama.py --local-qa
# 颿°åŒã³åºãã䜿ã£ãå®å
šãªãšãŒãžã§ã³ããã¹ãïŒllama3.2 ãš command-r ãå¿
èŠïŒ
uv run python tests/verify_with_ollama.py
```
**å¿
èŠæ¡ä»¶:**
- Ollama ã®ã€ã³ã¹ããŒã«ãšèµ·å
- ã¢ãã«: `llama3.2`ã`command-r` (`ollama pull <model>` ã§ã€ã³ã¹ããŒã«)
### ãã¹ããªãã·ã§ã³
| ãã¹ããã¡ã€ã« | ã¿ã€ã | LLM å¿
èŠ | ç®ç |
|--------------|------|----------|------|
| `test_indexing_search.py` | CI/CD | äžèŠ | ã³ã¢æ©èœã®èªåãã¹ã |
| `verify_with_ollama.py --simple` | æå | äžèŠ | åºæ¬æ¥ç¶ãã¹ã |
| `verify_with_ollama.py --local` | æå | äžèŠ | ããŒã«ã«æ€çŽ¢ã®æ€èšŒ |
| `verify_with_ollama.py --local-qa` | æå | å¿
èŠ | ããŒã«ã«ããã¥ã¡ã³ãã§ã® Q&A |
| `verify_with_ollama.py` | æå | å¿
èŠ | å®å
šãªãšãŒãžã§ã³ãã¯ãŒã¯ãã㌠|
詳现ãªãã¹ãããã¥ã¡ã³ã㯠`tests/README.md` ãåç
§ããŠãã ããã
### ããŒã«ã«ããã¥ã¡ã³ããã¹ã®ã«ã¹ã¿ãã€ãº
`LOCAL_DOCS_PATH` ç°å¢å€æ°ãèšå®ããŠãç¬èªã®ããã¥ã¡ã³ãã䜿çšã§ããŸãïŒ
```bash
export LOCAL_DOCS_PATH="/path/to/your/documents"
uv run python tests/test_indexing_search.py
```
### ã€ã³ããã¯ã¹ã®åæ§ç¯
`data/wiki_index.pkl` ãåé€ããŠãµãŒããŒãåèµ·åããŸãã
## ãã©ãã«ã·ã¥ãŒãã£ã³ã°
### ã€ã³ããã¯ã¹ãæ§ç¯ãããªã
- ãã£ã¹ã¯å®¹éã確èªïŒSimple çã§çŽ500MBãå®å
šçã§çŽ20GBå¿
èŠïŒ
- ååããŠã³ããŒãçšã®ã€ã³ã¿ãŒãããæ¥ç¶ã確èª
- Python ããŒãžã§ã³ã確èªïŒ3.10以äžå¿
èŠïŒ
### Ollama æ¥ç¶ãšã©ãŒ
- Ollama ãèµ·åããŠããã確èª: `ollama list`
- llama3.2 ãã€ã³ã¹ããŒã«ãããŠããã確èª: `ollama pull llama3.2`
- Ollama API ã«ã¢ã¯ã»ã¹å¯èœã確èª: `curl http://localhost:11434`
## ã©ã€ã»ã³ã¹
MIT License