How do I use llamacpp-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@llamacpp-mcp generate 3 drug-like molecules with molecular weight under 400" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

llamacpp-mcp

by lukasmki

Overview Schema Related Servers Score Discussions

Python

Local

llamacpp-mcp

An MCP (Model Context Protocol) wrapper for running local LLMs using llama-cpp-python. This project provides a framework for integrating local language models as MCP tools with built-in support for specialized models like SmileyLlama.

SmileyLlama Integration

Generate SMILES strings (chemical notation) for drug-like molecules with fine-grained constraints:

Lipinski's Rule of Five validation
Hydrogen bond donor/acceptor limits
Molecular weight and LogP constraints
Warhead SMARTS pattern matching
Macrocycle detection and filtering
And more...

Installation

Prerequisites

Python ≥ 3.13
uv (recommended) or pip

Setup

Clone the repository and install dependencies:

git clone <repository-url>
cd llamacpp-mcp
uv sync

Backend Configuration

The llama-cpp-python library requires compilation with hardware acceleration support. Choose the appropriate backend for your system:

CUDA (NVIDIA GPUs):

CMAKE_ARGS="-DGGML_CUDA=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dir

ROCm (AMD GPUs):

CMAKE_ARGS="-DGGML_HIPBLAS=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dir

Metal (Apple Silicon):

CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dir

CPU-only (no GPU acceleration):

uv sync

Related MCP server: mcp-ollama-python

Usage

Run the agent example

Setup your example/fastagent.secrets.yaml:

anthropic:
  api_key: your-api-key-here

Then run the agent interface in the terminal:

cd example/
uv run --extra agent agent.py

Running the MCP Server

Start the MCP server with a GGUF model:

uv run llamacpp-mcp -i /path/to/model.gguf

Additional parameters can be passed as command-line arguments:

uv run llamacpp-mcp --input model.gguf -n_gpu_layers -1 -n_threads 8

Common parameters:

-n_gpu_layers: Number of model layers to offload to GPU (-1 for all)
-n_threads: Number of CPU threads to use
-n_ctx: Context window size
-verbose: Verbosity level

Available Tools

generate_smiles

Generate SMILES strings for drug-like molecules with optional constraints.

Parameters:

max_hbond_donors: Maximum hydrogen bond donors
max_hbond_acceptors: Maximum hydrogen bond acceptors
max_molecular_weight: Maximum molecular weight
max_clogp: Maximum calculated LogP
lipinski_rule_of_five: Enforce Lipinski's Rule of Five
rule_of_three: Enforce Rule-of-Three for fragment-like molecules
And additional constraint options...

Dependencies

Core:

fastmcp>=2.13.1 - MCP server framework
llama-cpp-python>=0.3.16 - LLM inference engine

Optional:

fast-agent-mcp>=0.2.25 - For agent-based integrations

Development

Project Setup

The project uses uv for dependency management. After installing uv, run:

uv sync

This installs all dependencies in a local virtual environment.

Adding New Models

To add a new model type:

Create a subdirectory under src/llamacpp_mcp/models/
Implement models.py with Pydantic constraint definitions
Implement tools.py with tool registration function
Import and register tools in the main __init__.py

Configuration

Model parameters can be configured via:

Command-line arguments - Pass directly to llamacpp-mcp
Environment variables - Set before running the server
Agent Tool Configuration - See example/fastagent.config.yaml for reference

License

MIT License

Author

Lukas Kim

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lukasmki/llamacpp-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server