A
license-
qualityA
maintenanceInferBench's MCP server lets coding agents run, serve and benchmark local LLMs (text + image, llama.cpp + Stable Diffusion) on your own hardware on demand — measuring real tokens/sec and picking the optimal quant for your GPU from a 124-model catalog. Local-first, no cloud required.
Last updated
2
MIT