Skip to main content
Glama

voting-mcp

Principled social-choice aggregation as MCP tools — with a benchmark that measures the accuracy lift over naive majority vote.

Almost every multi-agent system aggregates votes with Counter(votes).most_common(1), throwing away preference order and confidence. voting-mcp ships the real rules (Borda, Copeland, Condorcet, approval, STV, linear opinion pool) as callable MCP tools — each with its known axiomatic behavior and explicit, documented tie-breaking — plus a reproducible benchmark that aggregates a diverse ensemble of LLMs on a reasoning set and reports accuracy with bootstrap confidence intervals.

The server is pure compute: stdio transport, no network, no file writes, no secrets — clean against the OWASP MCP Top 10 by construction.

Install

# run the server directly (once published)
uvx voting-mcp

# or from source
git clone https://github.com/HrishiKabra/voting-mcp && cd voting-mcp
uv sync
uv run python -m voting_mcp.server

Add it to an MCP client (e.g. Claude Desktop claude_desktop_config.json):

{
  "mcpServers": {
    "voting": { "command": "uvx", "args": ["voting-mcp"] }
  }
}

Related MCP server: HumanJudge

Tools

Every tool takes a profile ({candidates, ballots}) and returns a Result with the full co-winner set (winners, so ties are never hidden), the single tie-broken winner (or null when none exists), a ranking, per-candidate scores, and a note.

Tool

Ballots

Notes

borda

rankings

positional; Condorcet-inconsistent, clone-sensitive

copeland

rankings

Condorcet-consistent pairwise (+1 win, +0.5 tie)

condorcet

rankings

returns the pairwise winner or an explicit no-winner on a cycle

approval

approval sets

most-approved wins

stv

rankings

single-winner instant-runoff; clone-resistant

opinion_pool

distributions

linear pool — preserves confidence, not an argmax vote

plurality

rankings

baseline (most first choices)

majority

rankings

strict >50% or no winner

aggregate_rule

any

dispatch by a rule enum

Tie-breaking is an explicit parameter (lexicographic default, none, or seeded random).

Benchmark

Aggregate an ensemble of 5 models (one OpenAI-compatible client via OpenRouter) on ARC-Challenge and compare each rule to the naive majority vote:

uv sync --extra bench
uv run python -m bench.fetch_arc --limit 200
# prints a cost estimate and STOPS; add --yes to actually call the API, --mock for a free dry run
uv run python -m bench.run_ensemble --dataset bench/datasets/arc_challenge.jsonl --limit 200 --yes
uv run python -m bench.compare --dataset bench/datasets/arc_challenge.jsonl --limit 200

Every raw response is cached under bench/results/raw/; re-runs never re-call the API, so aggregation tweaks are free.

Results

5-model ensemble (gpt-4o-mini · gemini-2.5-flash-lite · deepseek-v3 · claude-haiku-4.5 · glm-4.7), n = 200, bootstrap 95% CI. Two datasets of different difficulty; full write-up and both plots in RESULTS.md.

MMLU-Pro (hard, baseline 73.5%) — the informative case:

Rule

Accuracy

95% CI

Δ vs majority

opinion_pool

0.755

[0.695, 0.815]

+0.020

majority_vote (baseline)

0.735

[0.679, 0.788]

approval

0.701

[0.640, 0.757]

−0.035

stv

0.693

[0.630, 0.750]

−0.043

copeland

0.647

[0.580, 0.710]

−0.088

condorcet

0.620

[0.550, 0.685]

−0.115

majority (strict)

0.590

[0.520, 0.655]

−0.145

borda

0.472

[0.405, 0.540]

−0.263

MMLU-Pro

The finding (honest): the value isn't "fancy voting beats majority." It's that the confidence-preserving rule (opinion_pool) wins when the crowd is uncertain (+2.0pp, the only rule above baseline — though its CI still overlaps, so suggestive, not conclusive), while forcing the distributions into full rankings actively hurtsborda collapses to 0.472, far below majority, because with 10 options the tail of the ranking is mostly noise. Aggregate the confidence; don't throw it away. On ARC-Challenge (baseline 96.8%, near-ceiling) nothing separates — every rule lands within overlapping CIs. See RESULTS.md.

Develop

uv run pytest -q
uv run ruff check .
uv run mypy --strict src
# exercise the tools in the MCP Inspector:
npx @modelcontextprotocol/inspector uv run python -m voting_mcp.server

Note: if you keep this repo under an iCloud-synced folder (e.g. ~/Desktop), iCloud can spawn duplicate .pth files that intermittently break the editable install. Tests use pythonpath=src; run the server with PYTHONPATH=src if an import fails, or move the repo off the synced folder.

License

MIT

A
license - permissive license
-
quality - not tested
A
maintenance

Maintenance

Maintainers
Response time
Release cycle
1Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/HrishiKabra/voting-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server