Skip to main content
Glama

llamacpp-mcp

An MCP (Model Context Protocol) wrapper for running local LLMs using llama-cpp-python. This project provides a framework for integrating local language models as MCP tools with built-in support for specialized models like SmileyLlama.

SmileyLlama Integration

Generate SMILES strings (chemical notation) for drug-like molecules with fine-grained constraints:

  • Lipinski's Rule of Five validation

  • Hydrogen bond donor/acceptor limits

  • Molecular weight and LogP constraints

  • Warhead SMARTS pattern matching

  • Macrocycle detection and filtering

  • And more...

Installation

Prerequisites

  • Python ≥ 3.13

  • uv (recommended) or pip

Setup

Clone the repository and install dependencies:

git clone <repository-url>
cd llamacpp-mcp
uv sync

Backend Configuration

The llama-cpp-python library requires compilation with hardware acceleration support. Choose the appropriate backend for your system:

CUDA (NVIDIA GPUs):

CMAKE_ARGS="-DGGML_CUDA=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dir

ROCm (AMD GPUs):

CMAKE_ARGS="-DGGML_HIPBLAS=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dir

Metal (Apple Silicon):

CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dir

CPU-only (no GPU acceleration):

uv sync

Usage

Run the agent example

Setup your example/fastagent.secrets.yaml:

anthropic:
  api_key: your-api-key-here

Then run the agent interface in the terminal:

cd example/
uv run --extra agent agent.py

Running the MCP Server

Start the MCP server with a GGUF model:

uv run llamacpp-mcp -i /path/to/model.gguf

Additional parameters can be passed as command-line arguments:

uv run llamacpp-mcp --input model.gguf -n_gpu_layers -1 -n_threads 8

Common parameters:

  • -n_gpu_layers: Number of model layers to offload to GPU (-1 for all)

  • -n_threads: Number of CPU threads to use

  • -n_ctx: Context window size

  • -verbose: Verbosity level

Available Tools

generate_smiles

Generate SMILES strings for drug-like molecules with optional constraints.

Parameters:

  • max_hbond_donors: Maximum hydrogen bond donors

  • max_hbond_acceptors: Maximum hydrogen bond acceptors

  • max_molecular_weight: Maximum molecular weight

  • max_clogp: Maximum calculated LogP

  • lipinski_rule_of_five: Enforce Lipinski's Rule of Five

  • rule_of_three: Enforce Rule-of-Three for fragment-like molecules

  • And additional constraint options...

Dependencies

Core:

  • fastmcp>=2.13.1 - MCP server framework

  • llama-cpp-python>=0.3.16 - LLM inference engine

Optional:

  • fast-agent-mcp>=0.2.25 - For agent-based integrations

Development

Project Setup

The project uses uv for dependency management. After installing uv, run:

uv sync

This installs all dependencies in a local virtual environment.

Adding New Models

To add a new model type:

  1. Create a subdirectory under src/llamacpp_mcp/models/

  2. Implement models.py with Pydantic constraint definitions

  3. Implement tools.py with tool registration function

  4. Import and register tools in the main __init__.py

Configuration

Model parameters can be configured via:

  1. Command-line arguments - Pass directly to llamacpp-mcp

  2. Environment variables - Set before running the server

  3. Agent Tool Configuration - See example/fastagent.config.yaml for reference

License

MIT License

Author

Lukas Kim

A
license - permissive license
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lukasmki/llamacpp-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server