# Environment Setup Guide for MCP WebScout
This guide provides detailed instructions for setting up the mcp-webscout environment from scratch.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Installation Steps](#installation-steps)
- [DeepSeek API Configuration](#deepseek-api-configuration)
- [Proxy Configuration](#proxy-configuration)
- [Environment Variables](#environment-variables)
- [Verification](#verification)
- [Troubleshooting](#troubleshooting)
## Prerequisites
Before installing mcp-webscout, ensure you have the following:
### Required Software
| Software | Version | Purpose | Installation |
| -------- | ------- | ------------------- | -------------------------------- |
| Python | >= 3.10 | Runtime environment | [python.org](https://python.org) |
## Installation Steps
### Step 1: Clone the Repository
```bash
git clone https://github.com/yourusername/mcp-webscout.git
cd mcp-webscout
```
### Step 2: Create Virtual Environment (Recommended)
Using Python (requires Python >= 3.10):
```bash
python -m venv .venv
```
Or using specific Python version:
```bash
python3.10 -m venv .venv
```
Activate the virtual environment:
- Windows: `.venv\Scripts\activate`
- macOS/Linux: `source .venv/bin/activate`
### Step 3: Install Dependencies
```bash
pip install -e ".[dev]"
```
### Step 4: Install Playwright Browsers
Crawl4AI requires Playwright browsers to be installed:
```bash
playwright install chromium
```
If you encounter issues, try:
```bash
# Install system dependencies (Ubuntu/Debian)
playwright install-deps chromium
# Or install all browsers
playwright install
```
## DeepSeek API Configuration
The LLM extraction mode requires a DeepSeek API key.
### Step 1: Register for DeepSeek
1. Visit [platform.deepseek.com](https://platform.deepseek.com/)
2. Sign up for an account
3. Complete verification
### Step 2: Get API Key
1. Go to "API Keys" section
2. Click "Create New Key"
3. Copy your API key (starts with `sk-`)
### Step 3: Configure in Environment
Add to your `.env` file:
```env
DEEPSEEK_API_KEY=sk-your-actual-key-here
```
## Proxy Configuration
### For Mainland China Users
DuckDuckGo may not be directly accessible from mainland China. You need to configure a proxy.
### Configure Proxy
Add to your `.env` file:
```env
# Your proxy URL
PROXY_URL=http://127.0.0.1:7890
# Enable proxy by default
USE_PROXY=true
```
### Testing Proxy
Test if your proxy is working:
```bash
# Using curl
curl -x http://127.0.0.1:7890 https://www.google.com
# Or use the test script
python -c "import requests; print(requests.get('https://www.google.com', proxies={'https': 'http://127.0.0.1:7890'}).status_code)"
```
## Environment Variables
Create a `.env` file in the project root:
```bash
cp .env.example .env
```
Edit `.env` with your actual values:
```env
# Required for LLM extraction
DEEPSEEK_API_KEY=sk-your-key-here
# Required for mainland China users
PROXY_URL=http://127.0.0.1:7890
USE_PROXY=true
# Optional: Customize default values
DEFAULT_MAX_LENGTH=5000
DEFAULT_TIMEOUT=30
# Optional: Windows UTF-8 support
PYTHONUTF8=1
```
## Verification
### Test Installation
Run the tests to verify everything is working:
```bash
pytest tests/ -v
```
### Test Search
```bash
python -m mcp_webscout --help
```
### Test with Claude Desktop
Add to your Claude Desktop configuration:
```json
{
"mcpServers": {
"webscout": {
"command": "mcp-webscout",
"env": {
"DEEPSEEK_API_KEY": "sk-your-key-here",
"PROXY_URL": "http://127.0.0.1:7890",
"USE_PROXY": "true"
}
}
}
}
```
Restart Claude Desktop and test the tools.
## Troubleshooting
### Common Issues
#### 1. Playwright Installation Fails
**Symptom:** `playwright install chromium` fails
**Solution:**
```bash
# Install system dependencies (Ubuntu/Debian)
sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2
# Or use the install-deps command
playwright install-deps chromium
```
#### 2. DuckDuckGo Search Timeout
**Symptom:** Search fails with timeout error
**Solution:**
- Check your proxy configuration
- Increase timeout: add `DEFAULT_TIMEOUT=60` to `.env`
- Test proxy: `curl -x http://127.0.0.1:7890 https://duckduckgo.com`
#### 3. DeepSeek API Error
**Symptom:** `401 Unauthorized` or API errors
**Solution:**
- Verify your API key is correct in `.env`
- Check if you have sufficient API quota
- Ensure proxy is configured if in mainland China
- Test with: `curl -H "Authorization: Bearer sk-your-key" https://api.deepseek.com/v1/models`
#### 4. Windows UTF-8 Encoding Error
**Symptom:** `UnicodeEncodeError` or garbled text
**Solution:**
Add to `.env`:
```env
PYTHONUTF8=1
```
Or set environment variable:
```powershell
[Environment]::SetEnvironmentVariable("PYTHONUTF8", "1", "User")
```