llamacpp-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@llamacpp-mcpgenerate 3 drug-like molecules with molecular weight under 400"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
llamacpp-mcp
An MCP (Model Context Protocol) wrapper for running local LLMs using llama-cpp-python. This project provides a framework for integrating local language models as MCP tools with built-in support for specialized models like SmileyLlama.
SmileyLlama Integration
Generate SMILES strings (chemical notation) for drug-like molecules with fine-grained constraints:
Lipinski's Rule of Five validation
Hydrogen bond donor/acceptor limits
Molecular weight and LogP constraints
Warhead SMARTS pattern matching
Macrocycle detection and filtering
And more...
Installation
Prerequisites
Python ≥ 3.13
uv(recommended) or pip
Setup
Clone the repository and install dependencies:
git clone <repository-url>
cd llamacpp-mcp
uv syncBackend Configuration
The llama-cpp-python library requires compilation with hardware acceleration support. Choose the appropriate backend for your system:
CUDA (NVIDIA GPUs):
CMAKE_ARGS="-DGGML_CUDA=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dirROCm (AMD GPUs):
CMAKE_ARGS="-DGGML_HIPBLAS=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dirMetal (Apple Silicon):
CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dirCPU-only (no GPU acceleration):
uv syncUsage
Run the agent example
Setup your example/fastagent.secrets.yaml:
anthropic:
api_key: your-api-key-hereThen run the agent interface in the terminal:
cd example/
uv run --extra agent agent.pyRunning the MCP Server
Start the MCP server with a GGUF model:
uv run llamacpp-mcp -i /path/to/model.ggufAdditional parameters can be passed as command-line arguments:
uv run llamacpp-mcp --input model.gguf -n_gpu_layers -1 -n_threads 8Common parameters:
-n_gpu_layers: Number of model layers to offload to GPU (-1 for all)-n_threads: Number of CPU threads to use-n_ctx: Context window size-verbose: Verbosity level
Available Tools
generate_smiles
Generate SMILES strings for drug-like molecules with optional constraints.
Parameters:
max_hbond_donors: Maximum hydrogen bond donorsmax_hbond_acceptors: Maximum hydrogen bond acceptorsmax_molecular_weight: Maximum molecular weightmax_clogp: Maximum calculated LogPlipinski_rule_of_five: Enforce Lipinski's Rule of Fiverule_of_three: Enforce Rule-of-Three for fragment-like moleculesAnd additional constraint options...
Dependencies
Core:
fastmcp>=2.13.1- MCP server frameworkllama-cpp-python>=0.3.16- LLM inference engine
Optional:
fast-agent-mcp>=0.2.25- For agent-based integrations
Development
Project Setup
The project uses uv for dependency management. After installing uv, run:
uv syncThis installs all dependencies in a local virtual environment.
Adding New Models
To add a new model type:
Create a subdirectory under
src/llamacpp_mcp/models/Implement
models.pywith Pydantic constraint definitionsImplement
tools.pywith tool registration functionImport and register tools in the main
__init__.py
Configuration
Model parameters can be configured via:
Command-line arguments - Pass directly to
llamacpp-mcpEnvironment variables - Set before running the server
Agent Tool Configuration - See
example/fastagent.config.yamlfor reference
License
MIT License
Author
Lukas Kim
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/lukasmki/llamacpp-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server