ArmBench MCP Server
Provides tools for benchmarking and running LLM inference on Arm64 cloud instances, measuring performance metrics like tokens/sec and memory usage, and serving results via an MCP-compatible API.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ArmBench MCP Serverrun a benchmark on Llama-3.2-3B with Q4_K_M"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
⚡ ArmBench — Arm64 LLM Inference Benchmark Suite + MCP Server
KleidiAI-optimized LLM benchmarking and inference server for Arm64 cloud infrastructure. Built for the Arm AI Optimization Challenge 2026.
🎯 What is ArmBench?
ArmBench is a one-command benchmarking tool that:
Deploys LLMs (Llama 3.2) on Arm64 cloud instances using llama.cpp + KleidiAI
Measures real performance — tokens/sec, time-to-first-token, memory usage across quantization levels (Q4_K_M vs Q8_0)
Serves results via an MCP-compatible FastAPI server any agent framework can call
Visualizes everything in a clean real-time dashboard
Related MCP server: Hugging Face MCP Server
🏗️ Architecture
armbench/
├── benchmark/ # llama.cpp + KleidiAI inference engine + metrics
├── mcp_server/ # FastAPI MCP-compatible LLM endpoint
├── dashboard/ # Real-time results dashboard (HTML)
├── scripts/ # One-command setup + benchmark + server scripts
└── docker/ # Arm64-optimized Docker configuration
🚀 Quick Start (Arm64 Instance)
1. Clone and setup
git clone https://github.com/sirmos/armbench.git
cd armbench
bash scripts/setup.sh2. Run benchmark
bash scripts/run_benchmark.sh3. Start MCP server
bash scripts/start_mcp.sh4. Open dashboard
Navigate to http://your-instance-ip:8000 in your browser.
☁️ Tested Arm64 Platforms
Platform | Instance | Arm CPU |
Oracle Cloud | VM.Standard.A1.Flex | Ampere Altra |
AWS | c7g.large | Graviton3 |
GCP | c4a-standard-4 | Axion |
📊 What We Benchmark
Metric | Description |
Tokens/sec | Inference throughput |
Time to First Token | Latency from prompt to first output token |
Memory (MB) | RAM consumed during inference |
Model size (GB) | Disk footprint per quantization level |
Models
Model | Quant | Size | Use case |
Llama-3.2-3B-Instruct | Q4_K_M | 1.9 GB | Speed-optimized |
Llama-3.2-3B-Instruct | Q8_0 | 3.4 GB | Quality-optimized |
🔌 MCP Server API
Endpoint | Method | Description |
| GET | Server info |
| GET | Health + platform info |
| GET | List available models |
| POST | Run inference |
| POST | Full benchmark suite |
| GET | MCP-compatible tools listing |
| GET | Interactive API docs |
Example: Generate
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "What is KleidiAI?", "model": "Llama-3.2-3B-Q4_K_M"}'⚙️ Arm-Specific Optimizations
KleidiAI: Arm's optimized kernel library for ML workloads
llama.cpp Arm SVE: Scalable Vector Extension support enabled at build time
Native CPU tuning:
-DLLAMA_NATIVE=ONcompiles for exact CPU microarchitectureThread optimization: Automatically uses all available Arm cores
📄 License
MIT License — see LICENSE
Built for the Arm AI Optimization Challenge 2026
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sirmos/armbench'
If you have feedback or need assistance with the MCP directory API, please join our Discord server