Skip to main content
Glama
sr-networks

clbench-fireworks-rft

by sr-networks

clbench-fireworks-rft

Reinforcement Fine-Tuning of Qwen3-8B on the CLBench exploitable_poker task, running on Fireworks infrastructure via eval-protocol MCP-Gym.

This is a port of the sr-networks/clbench-verifiers GRPO setup (Will Brown's verifiers framework + PrimeIntellect hosted training) onto Fireworks RFT. The CLBench poker simulator, action parsing, and reward shaping carry over unchanged; only the RL-framework glue is rewritten.


Why a port and not a copy

verifiers and Fireworks RFT are different abstractions. What transfers vs. what changes:

verifiers / Prime (upstream repo)

Fireworks RFT (this repo)

status

CLBenchEnv(vf.MultiTurnEnv) (env.py)

poker_adapter.py + poker_mcp.py MCP gym

ported

task.reset()/.step()/.get_instance_outcomes()

PokerEnv wrapper + poker_act tool

ported

rubric.py reward fns

reward.pytest_poker_rft.py evaluator

ported

parsing.py + guided_json (vLLM)

MCP tool params are the PokerAction schema

obsolete (tool-calling enforces structure)

cl-benchmark poker task

imported unchanged

no change

vf.RLTrainer GRPO + local vLLM

firectl create reinforcement-fine-tuning-job

platform

TOML configs + prime train

RFT job flags (see Launch)

re-expressed

Key win: because the poker_act tool's parameters are exactly the PokerAction fields (action / thinking / amount), the model emits structured tool calls and malformed-JSON parse failures become impossible — the old guided_json + parse_failure_penalty machinery is no longer needed.


Related MCP server: multivon-mcp

Files

file

role

poker_adapter.py

PokerEnv + PokerAdapter — wraps the CLBench task (the env.py port)

poker_mcp.py

PokerMcp(McpGym) — registers the poker_act tool, control-plane reward/termination

server.py

MCP-Gym server launcher (python server.py --port N)

reward.py

evaluator scoring (the rubric.py port): mean instance reward + illegal-action penalty

test_poker_rft.py

@evaluation_test binding dataset + gym + model + reward

make_dataset.py

generates poker_dataset.jsonl (one EvaluationRow per seed)

poker_dataset.jsonl

64-seed training dataset (regenerate with make_dataset.py)

requirements.txt

deps Fireworks installs into the rollout container

validate_connection.py

optional connectivity/structured-output smoke test (needs a served model)

setup.sh

installs deps + creates the bin/python 3.11 shim


Setup

./setup.sh                                  # deps + bin/python shim
export FIREWORKS_API_KEY="fw_..."           # https://fireworks.ai/account/api-keys
firectl set-api-key "$FIREWORKS_API_KEY"
export PATH="$PWD/bin:$PATH"                 # python3.11 shim first (gym spawns `python server.py`)

firectl (Go binary) install: brew tap fw-ai/firectl && brew trust fw-ai/firectl && brew install fw-ai/firectl/firectl


Launch an RFT job

Two paths. The direct firectl path is what we actually used (it avoids a CLI bug — see Gotchas).

A) Upload the evaluator, then launch with firectl ← used

# 1. upload the evaluator (env + reward) so Fireworks builds the rollout container
eval-protocol create rft \
  --evaluator test_poker_rft.py::test_poker_rft \
  --dataset poker_dataset.jsonl --mcp-server server.py \
  --training-config-base-model accounts/fireworks/models/qwen3-8b \
  --dry-run --skip-validation -y          # uploads evaluator; ignore the poller timeout

# 2. confirm evaluator is ACTIVE, upload dataset, create the job
firectl create dataset clbench-poker-qwen3-8b-data poker_dataset.jsonl
firectl create reinforcement-fine-tuning-job \
  --base-model accounts/fireworks/models/qwen3-8b \
  --dataset clbench-poker-qwen3-8b-data \
  --evaluator accounts/<ACCOUNT>/evaluators/test-poker-rftpytest-poker-rft \
  --output-model clbench-poker-qwen3-8b \
  --epochs 2 --learning-rate 1e-6 --temperature 1.0 \
  --max-output-tokens 1024 --response-candidates-count 8

B) Pure eval-protocol (once the poller bug is fixed upstream)

eval-protocol create rft --evaluator test_poker_rft.py::test_poker_rft \
  --dataset poker_dataset.jsonl --mcp-server server.py \
  --training-config-base-model accounts/fireworks/models/qwen3-8b \
  --training-config-output-model clbench-poker-qwen3-8b \
  --training-config-epochs 2 --training-config-learning-rate 1e-6 \
  --inference-parameters-temperature 1.0 --inference-parameters-max-output-tokens 1024 \
  --inference-parameters-response-candidates-count 8

Config mapping from the Prime TOML

rollouts_per_example=8--response-candidates-count 8 (GRPO group size) · temperature=1.0 · max_tokens=1024--max-output-tokens 1024 · enable_thinking=false baked into the gym prompt · guided_json → tool-call schema (free).


Training runs

See RUNS.md for the full log. Summary:

run

job id

base

output model

epochs

candidates

status

1

hj1u6nxa

qwen3-8b (free)

clbench-poker-qwen3-8b

2

8

launched 2026-06-25, RUNNING

Monitor: firectl get reinforcement-fine-tuning-job <job-id> · dashboard: https://app.fireworks.ai/dashboard


Gotchas (hard-won)

  • from __future__ import annotations breaks eval-protocol. It stringifies annotations, so FastMCP tool registration (issubclass("str", Context)) and the @evaluation_test signature validator both fail. Do not use it in poker_mcp.py or test_poker_rft.py.

  • FastMCP (this version) crashes on Optional[int] tool params while locating the Context arg. poker_act uses a plain int = -1 sentinel instead.

  • firectl needs firectl set-api-key; it does not read FIREWORKS_API_KEY automatically. (firectl whoami additionally needs OIDC signin — ignore it.)

  • eval-protocol create rft has a poller bug: it polls …/evaluators/<file>.py::<func> — the .py:: makes the URL malformed → HTTP 400 → false 10-minute "evaluator not ready" timeout. The evaluator is actually ACTIVE; launch via firectl.

  • macOS python is often 2.7. The gym spawns python server.py, so bin/python must shim to the 3.11 interpreter that has the deps and be first on PATH.

  • Rollouts run on Fireworks, in a container built from requirements.txt — so a local serverless deployment of the base model is not required for training (only for local pytest rollouts).

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sr-networks/clbench-fireworks-rft'

If you have feedback or need assistance with the MCP directory API, please join our Discord server