Skip to main content
Glama

create_flowspeech

Convert text or URL content into speech episodes with AI-enhanced grammar correction or direct processing modes.

Instructions

Create a FlowSpeech episode by converting text or URL content to speech. Supports smart mode (AI-enhanced, fixes grammar) and direct mode (no modifications). This tool will automatically poll until generation is complete.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
sourceTypeYesSource type: text or url
sourceContentYesSource content (text or URL)
speakerIdYesSpeaker name or ID. Use speaker name from get_speakers tool output (the "name" field, not speakerId). Full speaker ID also supported.
languageNoLanguage code (e.g., "zh" for Chinese, "en" for English). Default: zh
modeNoGeneration mode: "smart" (AI-enhanced, fixes grammar) or "direct" (no modifications)smart

Implementation Reference

  • Executes the create_flowspeech tool: resolves speaker, prepares request data, calls the client API, polls status until complete using pollUntilComplete, handles errors, and formats the output with formatFlowspeechEpisode.
    async execute(args, {log}: {log: any}) {
    	try {
    		// Resolve speaker name/ID to actual speaker ID
    		log.info('Resolving speaker identifier', {
    			input: args.speakerId,
    			language: args.language,
    		});
    
    		const resolvedSpeakers = await client.resolveSpeakers([args.speakerId]);
    		const speakers = [{speakerId: resolvedSpeakers[0]?.speakerId}];
    		const sources: FlowspeechSource[] = [
    			{
    				type: args.sourceType,
    				content: args.sourceContent,
    			},
    		];
    
    		// Use provided language or infer from resolved speaker
    		const allSpeakers = await client.getCachedSpeakers();
    		const resolvedSpeaker = allSpeakers.find(
    			(s) => s.speakerId === resolvedSpeakers[0]?.speakerId,
    		);
    		const language = args.language ?? resolvedSpeaker?.language ?? 'zh';
    
    		log.info('Creating FlowSpeech episode', {
    			sourceType: args.sourceType,
    			contentLength: args.sourceContent.length,
    			speakerId: resolvedSpeakers[0]?.speakerId,
    			language,
    			mode: args.mode,
    		});
    
    		const requestData: CreateFlowspeechRequest = {
    			sources,
    			speakers,
    			mode: args.mode,
    			language,
    		};
    
    		const submitResponse =
    			await client.flowspeech.createFlowspeech(requestData);
    
    		if (submitResponse.code !== 0) {
    			return `Failed to submit task: ${submitResponse.message ?? 'Unknown error'}`;
    		}
    
    		const episodeId = submitResponse.data?.episodeId;
    		if (!episodeId) {
    			return 'Failed to submit task: No episodeId returned';
    		}
    
    		log.info(`FlowSpeech task submitted successfully`, {episodeId});
    
    		const result = await pollUntilComplete(
    			async () => {
    				const statusResponse =
    					await client.flowspeech.getFlowspeechStatus(episodeId);
    				if (statusResponse.code !== 0) {
    					throw new Error(
    						statusResponse.message ?? 'Failed to query status',
    					);
    				}
    
    				if (!statusResponse.data) {
    					throw new Error('No episode data returned');
    				}
    
    				return statusResponse.data;
    			},
    			{
    				pollInterval: 5000,
    				maxRetries: 120,
    				onProgress(status, retry) {
    					log.debug(`FlowSpeech generation status: ${status}`, {
    						episodeId,
    						retry: `${retry}/120`,
    					});
    				},
    			},
    		);
    
    		if (!result.success) {
    			if (result.error) {
    				log.error('FlowSpeech generation failed', {
    					episodeId,
    					error: result.error,
    				});
    				return `FlowSpeech generation failed: ${result.error}`;
    			}
    
    			log.warn('FlowSpeech generation timeout', {
    				episodeId,
    				lastStatus: result.lastStatus,
    			});
    			return `FlowSpeech generation timeout\nLast status: ${result.lastStatus}\nEpisode ID: ${episodeId}`;
    		}
    
    		const episode = result.data!;
    		log.info('FlowSpeech generation completed', {episodeId});
    
    		return formatFlowspeechEpisode(episode);
    	} catch (error) {
    		const errorMessage = formatError(error);
    		log.error('Failed to create FlowSpeech', {error: errorMessage});
    		return `Failed to create FlowSpeech: ${errorMessage}`;
    	}
    },
  • Zod input schema definition for the tool parameters: sourceType, sourceContent, speakerId, language, mode.
    parameters: z.object({
    	sourceType: z.enum(['text', 'url']).describe('Source type: text or url'),
    	sourceContent: z.string().min(1).describe('Source content (text or URL)'),
    	speakerId: z
    		.string()
    		.min(1)
    		.describe(
    			'Speaker name or ID. Use speaker name from get_speakers tool output (the "name" field, not speakerId). Full speaker ID also supported.',
    		),
    	language: z
    		.string()
    		.optional()
    		.describe(
    			'Language code (e.g., "zh" for Chinese, "en" for English). Default: zh',
    		),
    	mode: z
    		.enum(['smart', 'direct'])
    		.default('smart')
    		.describe(
    			'Generation mode: "smart" (AI-enhanced, fixes grammar) or "direct" (no modifications)',
    		),
    }),
    annotations: {
  • Registers the create_flowspeech tool with the FastMCP server using server.addTool, including description, annotations, and the execute handler.
    	name: 'create_flowspeech',
    	description:
    		'Create a FlowSpeech episode by converting text or URL content to speech. Supports smart mode (AI-enhanced, fixes grammar) and direct mode (no modifications). This tool will automatically poll until generation is complete.',
    	parameters: z.object({
    		sourceType: z.enum(['text', 'url']).describe('Source type: text or url'),
    		sourceContent: z.string().min(1).describe('Source content (text or URL)'),
    		speakerId: z
    			.string()
    			.min(1)
    			.describe(
    				'Speaker name or ID. Use speaker name from get_speakers tool output (the "name" field, not speakerId). Full speaker ID also supported.',
    			),
    		language: z
    			.string()
    			.optional()
    			.describe(
    				'Language code (e.g., "zh" for Chinese, "en" for English). Default: zh',
    			),
    		mode: z
    			.enum(['smart', 'direct'])
    			.default('smart')
    			.describe(
    				'Generation mode: "smart" (AI-enhanced, fixes grammar) or "direct" (no modifications)',
    			),
    	}),
    	annotations: {
    		title: 'Create FlowSpeech',
    		openWorldHint: true,
    		readOnlyHint: false,
    	},
    	async execute(args, {log}: {log: any}) {
    		try {
    			// Resolve speaker name/ID to actual speaker ID
    			log.info('Resolving speaker identifier', {
    				input: args.speakerId,
    				language: args.language,
    			});
    
    			const resolvedSpeakers = await client.resolveSpeakers([args.speakerId]);
    			const speakers = [{speakerId: resolvedSpeakers[0]?.speakerId}];
    			const sources: FlowspeechSource[] = [
    				{
    					type: args.sourceType,
    					content: args.sourceContent,
    				},
    			];
    
    			// Use provided language or infer from resolved speaker
    			const allSpeakers = await client.getCachedSpeakers();
    			const resolvedSpeaker = allSpeakers.find(
    				(s) => s.speakerId === resolvedSpeakers[0]?.speakerId,
    			);
    			const language = args.language ?? resolvedSpeaker?.language ?? 'zh';
    
    			log.info('Creating FlowSpeech episode', {
    				sourceType: args.sourceType,
    				contentLength: args.sourceContent.length,
    				speakerId: resolvedSpeakers[0]?.speakerId,
    				language,
    				mode: args.mode,
    			});
    
    			const requestData: CreateFlowspeechRequest = {
    				sources,
    				speakers,
    				mode: args.mode,
    				language,
    			};
    
    			const submitResponse =
    				await client.flowspeech.createFlowspeech(requestData);
    
    			if (submitResponse.code !== 0) {
    				return `Failed to submit task: ${submitResponse.message ?? 'Unknown error'}`;
    			}
    
    			const episodeId = submitResponse.data?.episodeId;
    			if (!episodeId) {
    				return 'Failed to submit task: No episodeId returned';
    			}
    
    			log.info(`FlowSpeech task submitted successfully`, {episodeId});
    
    			const result = await pollUntilComplete(
    				async () => {
    					const statusResponse =
    						await client.flowspeech.getFlowspeechStatus(episodeId);
    					if (statusResponse.code !== 0) {
    						throw new Error(
    							statusResponse.message ?? 'Failed to query status',
    						);
    					}
    
    					if (!statusResponse.data) {
    						throw new Error('No episode data returned');
    					}
    
    					return statusResponse.data;
    				},
    				{
    					pollInterval: 5000,
    					maxRetries: 120,
    					onProgress(status, retry) {
    						log.debug(`FlowSpeech generation status: ${status}`, {
    							episodeId,
    							retry: `${retry}/120`,
    						});
    					},
    				},
    			);
    
    			if (!result.success) {
    				if (result.error) {
    					log.error('FlowSpeech generation failed', {
    						episodeId,
    						error: result.error,
    					});
    					return `FlowSpeech generation failed: ${result.error}`;
    				}
    
    				log.warn('FlowSpeech generation timeout', {
    					episodeId,
    					lastStatus: result.lastStatus,
    				});
    				return `FlowSpeech generation timeout\nLast status: ${result.lastStatus}\nEpisode ID: ${episodeId}`;
    			}
    
    			const episode = result.data!;
    			log.info('FlowSpeech generation completed', {episodeId});
    
    			return formatFlowspeechEpisode(episode);
    		} catch (error) {
    			const errorMessage = formatError(error);
    			log.error('Failed to create FlowSpeech', {error: errorMessage});
    			return `Failed to create FlowSpeech: ${errorMessage}`;
    		}
    	},
    });
  • TypeScript type definitions for CreateFlowspeechRequest (lines 25-30) and CreateFlowspeechResponse (32-35) used by the client and tool.
    export type CreateFlowspeechRequest = {
    	sources: FlowspeechSource[];
    	speakers: Array<{speakerId?: string}>;
    	language?: string;
    	mode?: 'smart' | 'direct';
    };
    
    export type CreateFlowspeechResponse = {
    	episodeId: string;
    };
  • FlowspeechClient.createFlowspeech method: sends POST request to '/v1/flow-speech/episodes' API endpoint with the request data.
    async createFlowspeech(
    	data: CreateFlowspeechRequest,
    ): Promise<ApiResponse<CreateFlowspeechResponse>> {
    	const response = await this.axiosInstance.post<
    		ApiResponse<CreateFlowspeechResponse>
    	>('/v1/flow-speech/episodes', data);
    	return response.data;
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it reveals that the tool 'automatically polls until generation is complete' (asynchronous behavior) and explains the difference between smart mode (AI-enhanced with grammar fixes) and direct mode (no modifications). While annotations provide readOnlyHint=false and openWorldHint=true, the description adds practical implementation details that help the agent understand runtime behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly front-loaded with the core purpose in the first sentence, followed by mode explanations and behavioral detail. Every sentence earns its place by providing essential information without redundancy. The three-sentence structure is efficient and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a creation tool with no output schema, the description provides good context about what the tool does and its behavioral characteristics. It covers the creation process, mode differences, and polling behavior. However, it doesn't mention what the tool returns (e.g., episode ID, status object) or error conditions, which would be helpful given the absence of an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already documents all 5 parameters thoroughly. The description mentions 'smart mode (AI-enhanced, fixes grammar) and direct mode (no modifications)' which slightly elaborates on the mode parameter, but doesn't add significant semantic value beyond what's already in the schema descriptions. This meets the baseline expectation for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Create a FlowSpeech episode'), the resource involved ('by converting text or URL content to speech'), and distinguishes it from siblings by focusing on FlowSpeech rather than podcast-related tools. It provides a precise verb+resource combination that leaves no ambiguity about the tool's function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool (converting text/URL to speech) and mentions two modes (smart vs direct) which gives operational guidance. However, it doesn't explicitly state when to choose this tool over sibling tools like create_podcast or generate_podcast_audio, nor does it mention any prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/marswaveai/listenhub-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server