Skip to main content
Glama

bizhawk_play_input_sequence

Play a series of joypad inputs frame-by-frame in a single round-trip, optionally capturing screenshots and memory reads at fixed intervals, and abort if a specified memory address changes.

Instructions

PURPOSE: Play a pre-built sequence of per-frame joypad inputs back-to-back, advancing one frame per element, ENTIRELY SERVER-SIDE in a single bridge round-trip. Optionally captures screenshots AND labeled memory reads at fixed frame intervals during the play, and optionally aborts play early when a specified memory address changes. All observations come back inline so the agent sees the full trajectory + game state in one tool response. USAGE: Use whenever you have ≥10 frames of inputs to play in order — TAS movie playback, scripted multi-frame sequences, AI-search of input patterns, agent-driven gameplay. ONE bridge round-trip ships N frames instead of the 2N round-trips you'd pay looping bizhawk_press_buttons + bizhawk_frame_advance(1). For sequences over ~200 frames, CHUNK. FOR AGENT-DRIVEN PLAY: combine screenshot_every, observe_memory, and stop_on_memory_change for the killer pattern — 'walk right for up to 200 frames, observing screenshot+x+y+hp every second, but STOP the moment the room ID changes'. The agent sees: where Samus was at each second, AND whether the goal (room transition) was reached, AND screenshots for visual confirmation — all in one tool response. BEHAVIOR: For each frames element, calls joypad.set with that frame's buttons then emu.frameadvance. The bridge's main poll loop is BLOCKED for the duration of the call (no other RPCs, no heartbeat) until the sequence finishes or fails. With screenshot_every, each captured screenshot adds ~1 frame of wall-clock. With observe_memory, each observation also reads the listed memory addresses at the same frame the screenshot was taken — values come back labeled by name in each observation's memory field. With stop_on_memory_change, the bridge records the listed address's value before the first frame, re-reads it after each frame, and aborts the sequence the moment it changes (a final observation is captured at the stop frame regardless of cadence). Returns an error if joypad.set / emu.frameadvance / client.screenshot is missing when needed, if any observe_memory entry references an unknown domain, if any width isn't 'u8' / 'u16' / 'u32', or if any address is out of range. RETURNS: A text summary ('Played N frames. Final framecount: M. Stopped early: yes/no [reason]. Captured K observations with their memory values') followed by K inline image content blocks (one per observation, in frame order). Each observation in the text summary includes its frame_offset and labeled memory values so the agent can correlate the visible screenshot with the game state at that exact frame.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
framesYesArray of per-frame input objects. Each element describes ONE emulated frame: `{"buttons": {"Right": true, ...}, "player": 1}`. Empty `buttons` (or empty object) = no input on that frame. `player` defaults to 1 if omitted. Array order = frame playback order. Chunk longer sequences across multiple calls (≤200 frames each is a reasonable upper bound) to keep the bridge responsive.
screenshot_everyNoOptional. If set, capture a PNG screenshot every N frames during playback (and one extra at the final frame regardless of remainder). Each screenshot costs ~1 wall-clock frame for client.screenshot, so 60 (≈1 sec of game time) is a good default — captures meaningful state changes without doubling batch latency. Omit to skip screenshots. If `observe_memory` is also set, screenshots and memory reads happen at the same observation points.
screenshot_dirNoOptional. Directory to write screenshot PNGs into when `screenshot_every` is set. Default: C:/temp. Must exist and be writable. Files are named `<prefix>-NNNN.png` where NNNN is the frame offset within the batch (zero-padded to 4 digits).
screenshot_prefixNoOptional. Filename prefix for screenshots when `screenshot_every` is set. Default: 'obs'.
observe_memoryNoOptional. List of memory reads to perform at each observation point (alongside screenshots if `screenshot_every` is also set). Each result lands in the observation's `memory` field keyed by `name`. Use this to track game-state values per observation — e.g. on Super Metroid, track HP/X/Y/room-ID at each screenshot so you see how state changes across the play batch.
stop_on_memory_changeNoOptional. If set, the bridge reads the specified memory value before the first frame, re-reads it after every frame, and ABORTS the play sequence the moment it changes. The result will have `stopped_early: true` and `stop_reason: 'memory_changed'`. A final observation is captured at the stop frame even if it's not on the normal cadence. Killer use case: watch the room ID — Samus walks through a door, room ID changes, play stops at the exact frame of the transition.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden of behavioral disclosure. It clearly states that the bridge's main poll loop is BLOCKED during the call, that each screenshot adds ~1 frame of wall-clock, and how the stop_on_memory_change works (records initial value, checks after each frame, aborts on change). Error conditions are listed (missing methods, unknown domain, invalid width, out-of-range address). The description is transparent about all key behaviors.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured into sections (PURPOSE, USAGE, BEHAVIOR, RETURNS), and the most critical information is front-loaded. However, it is verbose; some details (like the killer pattern) are repeated in both USAGE and BEHAVIOR sections. While every sentence adds value, compactness could be improved without losing clarity. Still, it remains relatively easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema, no annotations), the description is fully complete. It covers purpose, usage, behavior, error conditions, and return format (text summary plus inline images). It explains optional features and their interactions (e.g., screenshot and memory read cadence). No gaps are apparent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (all parameters have descriptions). The description adds substantial value beyond the schema. For example, the `screenshot_every` parameter explains that each screenshot costs ~1 frame and recommends 60 as a good default. The `observe_memory` parameter gives a concrete example for Super Metroid. The `stop_on_memory_change` parameter describes the killer use case. The `frames` parameter includes chunking advice. The descriptions provide meaningful semantics for effective tool use.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description begins with a clear statement of purpose: 'Play a pre-built sequence of per-frame joypad inputs back-to-back, advancing one frame per element, ENTIRELY SERVER-SIDE in a single bridge round-trip.' It distinguishes itself from siblings (bizhawk_press_buttons and bizhawk_frame_advance) by noting that this tool batches many frames into one call, reducing round-trips. The optional features (screenshots, memory reads, early stop) are also described, making the tool's capabilities very clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The USAGE section explicitly advises when to use this tool ('whenever you have ≥10 frames of inputs to play in order') and contrasts with looping the two sibling tools, which would cost 2N round-trips. It also recommends chunking for sequences over ~200 frames. However, it does not explicitly state when NOT to use it (e.g., for very short sequences it might be overkill), though the recommendation for ≥10 frames implies exclusion for shorter sequences. The guidance is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dmang-dev/mcp-bizhawk'

If you have feedback or need assistance with the MCP directory API, please join our Discord server