Which integrations are available for this server?

Enables LangChain agents to score pronunciation via the MCP server, using tools for audio evaluation with ref_text and returning phoneme-level feedback. Allows LangGraph ReAct agents to utilize the MCP tools for pronunciation evaluation as part of LangChain workflows. Integrates with the OpenAI Agents SDK, allowing agents to call pronunciation scoring tools on audio URLs or base64 data.

CHIVOX speech MCP

by boyzhong123

Overview Schema Related Servers Score Discussions

TypeScript

Remote

TL;DR — LLMs can't hear audio. Chivox MCP is a hosted MCP server that scores pronunciation at the phoneme level — Mandarin tones included. One tools/call returns overall / accuracy / pron / fluency / details[].phone[] (pronunciation, fluency, per-phoneme breakdown) in a stable JSON shape your model can reason over. Not STT. Not a Whisper wrapper.

On this page: Fit check · Quickstart · Response JSON · Tools · Transport · Compare · Coach loop · Mandarin · English · Pricing · FAQ

🎯 Is this for you?

Most production teams run Whisper + Chivox together: Whisper to transcribe what was said, Chivox to score how well. They don't compete.

Related MCP server: brainiall-mcp-server

🚀 Quickstart

Hosted endpoint: https://mcp-global.cloud.chivox.com · every request needs Authorization: Bearer <api_key>. Get a key →

Client	Setup
Cursor	`~/.cursor/mcp.json` — IDE MCP, zero install
LangChain	LangGraph ReAct agent + MCP adapter
OpenAI Agents SDK	`agents.mcp.MCPServerStreamableHttp`
Claude Desktop	Local proxy for mic streaming
Raw MCP SDK	Direct `mcp` Python client

Cursor (zero install)

// ~/.cursor/mcp.json
{
  "mcpServers": {
    "chivox-speech-eval": {
      "type": "streamable-http",
      "url": "https://mcp-global.cloud.chivox.com",
      "headers": { "Authorization": "Bearer <your_api_key>" }
    }
  }
}

LangChain

from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent

client = MultiServerMCPClient({
    "chivox": {
        "transport": "streamable_http",
        "url": "https://mcp-global.cloud.chivox.com",
        "headers": {"Authorization": "Bearer <your_api_key>"},
    }
})
tools = await client.get_tools()  # discovers all 16 tools

agent = create_react_agent("openai:gpt-4o-mini", tools)
result = await agent.ainvoke({"messages": [(
    "user",
    "Score https://example.com/audio/sentence.mp3, ref: I think therefore I am",
)]})

OpenAI Agents SDK

from agents import Agent, Runner
from agents.mcp import MCPServerStreamableHttp

chivox = MCPServerStreamableHttp(
    params={
        "url": "https://mcp-global.cloud.chivox.com",
        "headers": {"Authorization": "Bearer <your_api_key>"},
    },
    name="chivox-speech-eval",
)

async with chivox:
    agent = Agent(
        name="coach",
        instructions="Professional speaking coach",
        mcp_servers=[chivox],
    )
    r = await Runner.run(
        agent,
        "Score https://example.com/audio/sentence.mp3, ref: I think therefore I am",
    )
    print(r.final_output)

Claude Desktop (mic streaming via local proxy)

npm install -g chivox-local-mcp

// ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "chivox": {
      "command": "chivox-local-mcp",
      "env": {
        "MCP_REMOTE_URL": "https://mcp-global.cloud.chivox.com",
        "MCP_API_KEY": "<your_api_key>"
      }
    }
  }
}

Raw MCP SDK

import asyncio
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession

async def main():
    async with streamablehttp_client(
        "https://mcp-global.cloud.chivox.com",
        headers={"Authorization": "Bearer <your_api_key>"},
    ) as (r, w, _):
        async with ClientSession(r, w) as s:
            await s.initialize()
            out = await s.call_tool("en_sentence_eval", {
                "ref_text": "I think therefore I am",
                "audio_url": "https://example.com/audio/sentence.mp3",
            })
            print(out)

asyncio.run(main())

More clients (Claude Code, Windsurf, Zed, Mastra, function-calling mode) → docs → Clients

🧠 What the LLM actually sees

Every tool returns the same top-level shape — switch locale or granularity with zero schema work. Example for "hello":

{
  "overall": 85,
  "accuracy": 82,
  "pron": 88,
  "integrity": 95,
  "fluency": { "overall": 78, "speed": 65, "pause": 2 },
  "details": [
    {
      "char": "hello",
      "score": 85,
      "phone": [
        { "phoneme": "h",  "score": 90, "dp_type": "normal" },
        { "phoneme": "ɛ",  "score": 82, "dp_type": "normal" },
        { "phoneme": "l",  "score": 88, "dp_type": "normal" },
        { "phoneme": "oʊ", "score": 80, "dp_type": "normal" }
      ]
    }
  ]
}

For English mispronunciations, phoneme_error: { expected, actual } is included. Mandarin adds tone_ref / tone_detected with sandhi-aware dp_type verdicts. Full field list →

🛠️ Tools catalog

Inline audio: pass audio_url or audio_base64 in the tool call — no upload round-trip. Formats: mp3 · wav · ogg · m4a · aac · pcm. Per-tool notes →

🔌 Dual transport

Two ways to feed audio — same result shape, different UX. Function-calling fallback: fc-global.cloud.chivox.com.

⚖️ How it compares

Rule of thumb — use Whisper to know what was said; use Chivox to know how well. They stack.

💬 …and here's what your LLM does with it

Pipe that JSON straight into any chat model with a one-line system prompt — "You are a warm pronunciation coach. Diagnose, then drill." — and you get a real lesson back. No fine-tuning. No audio understanding. Just chat.completion.

Why this works — the LLM never "heard" the audio. The JSON names the problem in fields it already understands (dp_type: "mispron", phoneme_error.actual, tone_ref vs tone_detected), so a vanilla chat.completion can diagnose like a human teacher.

🔁 The three-stage loop

🎤 Input: 1-minute learner recording → Output: warm feedback + targeted drill, end-to-end in < 1.6 seconds.

🏮 The moat: a tireless Mandarin tutor

30M+ learners worldwide study Mandarin — including heritage speakers and adult beginners — yet few platforms score tone errors (mā / má / mǎ / mà) at the phoneme level in English. Chivox's Chinese engine is trained on the same data that powers China's Putonghua Proficiency Test (普通话水平测试, PSC).

🇬🇧 And yes — exam-grade English too

Exam-grade rubrics on the same MCP endpoints: IELTS · TOEFL · Cambridge YLE · K-12 reading assessments for English, plus PSC-aligned Mandarin scoring. Same JSON shape, 20+ scoring dimensions — just change ref_text and accent.

💎 Why developers ship with Chivox MCP

Plus: streaming + inline modes · TLS 1.3 end-to-end · audio discarded after scoring (JSON retained 30 days) · on-prem available for enterprise · limits & privacy →

💳 Pricing

Honest defaults. Start with 600 free calls (30 days) and all 16 tools unlocked — no feature gates, no card. When you need more, pay per successful call at tiered rates — the more you ship, the cheaper each call gets.

Free tier ≠ crippled tier. Every new account gets 600 free calls valid for 30 days with the full 16-tool catalog — same engine, same JSON, same SLA as paid keys. After the trial window or when calls are used up, top up from $10 and let the volume tiers do the rest. Failed calls are never billed.

❓ FAQ

Is this just another wrapper around Whisper?

No. Whisper transcribes; Chivox scores. The engine is trained on exam-graded samples and returns phoneme-level details[].phone[] — not a transcript. Most teams run both.

Does it work offline / on-device?

The hosted MCP server needs outbound access to the scoring engine. For air-gapped deployments, contact us — we ship an on-prem container for enterprise customers.

What about dialects and accents?

Mandarin targets standard Pǔtōnghuà with sandhi-aware tone verdicts. English supports en-US, en-GB, and en-AU rubrics via locale parameters on the relevant tools.

Which LLMs work out of the box?

Any model with OpenAI-style function calling: GPT-4o / 5.x, Claude Sonnet / Opus, Gemini, DeepSeek, GLM, Kimi, Doubao, Qwen. Tool schemas are forwarded verbatim.

Can I use this in a browser?

For quick demos, yes — but production traffic should flow through your backend so the API key stays server-side. Privacy notes →

🤝 Star us · say hi

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/boyzhong123/mcp22'

If you have feedback or need assistance with the MCP directory API, please join our Discord server