clbench-fireworks-rft
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@clbench-fireworks-rftStart a reinforcement fine-tuning job on the poker dataset with 2 epochs."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
clbench-fireworks-rft
Reinforcement Fine-Tuning of Qwen3-8B on the CLBench exploitable_poker task, running on
Fireworks infrastructure via eval-protocol MCP-Gym.
This is a port of the sr-networks/clbench-verifiers
GRPO setup (Will Brown's verifiers framework + PrimeIntellect hosted training) onto Fireworks RFT.
The CLBench poker simulator, action parsing, and reward shaping carry over unchanged; only the
RL-framework glue is rewritten.
Why a port and not a copy
verifiers and Fireworks RFT are different abstractions. What transfers vs. what changes:
| Fireworks RFT (this repo) | status |
|
| ported |
|
| ported |
|
| ported |
| MCP tool params are the | obsolete (tool-calling enforces structure) |
| imported unchanged | no change |
|
| platform |
TOML configs + | RFT job flags (see Launch) | re-expressed |
Key win: because the poker_act tool's parameters are exactly the PokerAction fields
(action / thinking / amount), the model emits structured tool calls and malformed-JSON
parse failures become impossible — the old guided_json + parse_failure_penalty machinery is no
longer needed.
Related MCP server: multivon-mcp
Files
file | role |
|
|
|
|
| MCP-Gym server launcher ( |
| evaluator scoring (the |
|
|
| generates |
| 64-seed training dataset (regenerate with |
| deps Fireworks installs into the rollout container |
| optional connectivity/structured-output smoke test (needs a served model) |
| installs deps + creates the |
Setup
./setup.sh # deps + bin/python shim
export FIREWORKS_API_KEY="fw_..." # https://fireworks.ai/account/api-keys
firectl set-api-key "$FIREWORKS_API_KEY"
export PATH="$PWD/bin:$PATH" # python3.11 shim first (gym spawns `python server.py`)firectl (Go binary) install:
brew tap fw-ai/firectl && brew trust fw-ai/firectl && brew install fw-ai/firectl/firectl
Launch an RFT job
Two paths. The direct firectl path is what we actually used (it avoids a CLI bug — see Gotchas).
A) Upload the evaluator, then launch with firectl ← used
# 1. upload the evaluator (env + reward) so Fireworks builds the rollout container
eval-protocol create rft \
--evaluator test_poker_rft.py::test_poker_rft \
--dataset poker_dataset.jsonl --mcp-server server.py \
--training-config-base-model accounts/fireworks/models/qwen3-8b \
--dry-run --skip-validation -y # uploads evaluator; ignore the poller timeout
# 2. confirm evaluator is ACTIVE, upload dataset, create the job
firectl create dataset clbench-poker-qwen3-8b-data poker_dataset.jsonl
firectl create reinforcement-fine-tuning-job \
--base-model accounts/fireworks/models/qwen3-8b \
--dataset clbench-poker-qwen3-8b-data \
--evaluator accounts/<ACCOUNT>/evaluators/test-poker-rftpytest-poker-rft \
--output-model clbench-poker-qwen3-8b \
--epochs 2 --learning-rate 1e-6 --temperature 1.0 \
--max-output-tokens 1024 --response-candidates-count 8B) Pure eval-protocol (once the poller bug is fixed upstream)
eval-protocol create rft --evaluator test_poker_rft.py::test_poker_rft \
--dataset poker_dataset.jsonl --mcp-server server.py \
--training-config-base-model accounts/fireworks/models/qwen3-8b \
--training-config-output-model clbench-poker-qwen3-8b \
--training-config-epochs 2 --training-config-learning-rate 1e-6 \
--inference-parameters-temperature 1.0 --inference-parameters-max-output-tokens 1024 \
--inference-parameters-response-candidates-count 8Config mapping from the Prime TOML
rollouts_per_example=8 → --response-candidates-count 8 (GRPO group size) · temperature=1.0 ·
max_tokens=1024 → --max-output-tokens 1024 · enable_thinking=false baked into the gym prompt ·
guided_json → tool-call schema (free).
Training runs
See RUNS.md for the full log. Summary:
run | job id | base | output model | epochs | candidates | status |
1 |
| qwen3-8b (free) | clbench-poker-qwen3-8b | 2 | 8 | launched 2026-06-25, RUNNING |
Monitor: firectl get reinforcement-fine-tuning-job <job-id> · dashboard: https://app.fireworks.ai/dashboard
Gotchas (hard-won)
from __future__ import annotationsbreaks eval-protocol. It stringifies annotations, so FastMCP tool registration (issubclass("str", Context)) and the@evaluation_testsignature validator both fail. Do not use it inpoker_mcp.pyortest_poker_rft.py.FastMCP (this version) crashes on
Optional[int]tool params while locating theContextarg.poker_actuses a plainint = -1sentinel instead.firectlneedsfirectl set-api-key; it does not readFIREWORKS_API_KEYautomatically. (firectl whoamiadditionally needs OIDCsignin— ignore it.)eval-protocol create rfthas a poller bug: it polls…/evaluators/<file>.py::<func>— the.py::makes the URL malformed → HTTP 400 → false 10-minute "evaluator not ready" timeout. The evaluator is actually ACTIVE; launch viafirectl.macOS
pythonis often 2.7. The gym spawnspython server.py, sobin/pythonmust shim to the 3.11 interpreter that has the deps and be first onPATH.Rollouts run on Fireworks, in a container built from
requirements.txt— so a local serverless deployment of the base model is not required for training (only for local pytest rollouts).
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sr-networks/clbench-fireworks-rft'
If you have feedback or need assistance with the MCP directory API, please join our Discord server