MCP Server with Local LLM

Overview Schema Related Servers Score Discussions

mcp-server
deploy

gpu-server-setup.md•5.03 KiB

# GPU Server Deployment Guide This guide helps you deploy the MCP server on your GPU server after cloning from Git. ## Prerequisites on GPU Server - Python 3.8+ - CUDA-compatible GPU with drivers installed - Git - Network access to download models ## Deployment Steps ### 1. Clone the Repository ```bash # Clone your repository (replace with your actual repo URL) git clone https://github.com/yourusername/mcp-server.git cd mcp-server ``` ### 2. Set Up Python Environment ```bash # Create virtual environment (recommended) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt ``` ### 3. Configure Environment ```bash # Copy configuration template cp config.env .env # Edit configuration for GPU server nano .env ``` **GPU Server `.env` configuration:** ```bash # Model Configuration MODEL_NAME=Qwen/Qwen3-8B MAX_NEW_TOKENS=512 TEMPERATURE=0.7 # GPU Configuration (adjust as needed) CUDA_VISIBLE_DEVICES=0 ``` ### 4. Test the Installation ```bash # Test server functionality (without loading model) python test_server.py # If tests pass, run the full server python mcp_server.py ``` ### 5. Network Access Setup #### Option A: SSH Tunneling (Recommended for security) **On your local machine:** ```bash # Create SSH tunnel to forward MCP server ssh -L 8080:localhost:8080 username@gpu-server-ip # Or for stdio-based MCP (more complex) ssh -t username@gpu-server-ip "cd /path/to/mcp-server && python mcp_server.py" ``` #### Option B: Direct Network Access **On GPU server, modify `mcp_server.py` for network access:** Add this network-enabled version to your deployment: ```python # Add to mcp_server.py after line 544 async def run_network(self, host: str = "0.0.0.0", port: int = 8080): """Run MCP server over HTTP for remote access""" from aiohttp import web, web_runner import aiohttp_cors async def handle_mcp_request(request): data = await request.json() response = await self.server.handle_request(data) return web.json_response(response) app = web.Application() app.router.add_post('/mcp', handle_mcp_request) # Enable CORS for cross-origin requests cors = aiohttp_cors.setup(app) cors.add(app.router.add_resource("/mcp").add_route("POST", handle_mcp_request)) runner = web_runner.AppRunner(app) await runner.setup() site = web.TCPSite(runner, host, port) await site.start() logger.info(f"🌐 MCP Server running on http://{host}:{port}/mcp") # Keep server running try: await asyncio.Future() # Run forever except KeyboardInterrupt: await runner.cleanup() ``` ### 6. Firewall Configuration ```bash # Allow MCP server port (if using network access) sudo ufw allow 8080 # Or for specific IP only: # sudo ufw allow from YOUR_LOCAL_IP to any port 8080 ``` ## Client Configuration ### For Claude Desktop (Local Machine) **Option A: SSH Tunnel + Local Config** ```json { "mcpServers": { "local-llm": { "command": "ssh", "args": [ "-t", "username@gpu-server-ip", "cd /path/to/mcp-server && python mcp_server.py" ] } } } ``` **Option B: HTTP Client (if using network mode)** Create a local HTTP-to-stdio bridge script. ### For Direct SSH Access ```bash # Connect to GPU server and run MCP server ssh username@gpu-server-ip cd mcp-server python mcp_server.py ``` ## Monitoring and Maintenance ### Check GPU Usage ```bash # Monitor GPU usage nvidia-smi -l 1 # Check memory usage htop ``` ### Update Deployment ```bash # Pull latest changes git pull origin main # Restart server pkill -f mcp_server.py python mcp_server.py ``` ### Logs and Debugging ```bash # Run with verbose logging python mcp_server.py 2>&1 | tee mcp-server.log # Check system resources df -h # Disk space free -h # Memory ``` ## Security Considerations 1. **Use SSH tunneling** instead of direct network access when possible 2. **Configure firewall** to restrict access to MCP server port 3. **Use strong authentication** for SSH access 4. **Keep dependencies updated** regularly 5. **Monitor resource usage** to prevent abuse ## Troubleshooting ### Common Issues 1. **CUDA Out of Memory:** - Reduce model size or use smaller model - Check `CUDA_VISIBLE_DEVICES` setting - Monitor with `nvidia-smi` 2. **Network Connection Issues:** - Check firewall settings - Verify SSH tunnel is active - Test with `telnet gpu-server-ip 8080` 3. **Model Download Issues:** - Check internet connectivity on GPU server - Verify HuggingFace access - Check disk space for model storage 4. **Permission Issues:** - Ensure proper file permissions: `chmod +x mcp_server.py` - Check Python virtual environment activation - Verify write permissions for logs and cache ## Performance Optimization 1. **Model Caching:** First run will download model (~8GB for Qwen3-8B) 2. **GPU Memory:** Monitor usage with `nvidia-smi` 3. **CPU Usage:** Use `htop` to monitor system resources 4. **Network Latency:** SSH tunneling adds ~1-5ms latency per request

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/0xsaju/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

gpu-server-setup.md•5.03 KiB