Skip to main content
Glama
devbrother2024

TypeScript MCP Server Boilerplate

generate-image

Create images from text prompts using HuggingFace Inference API with FLUX.1-schnell model via Together. Specify prompt and inference steps to generate visual content.

Instructions

HuggingFace Inference API를 사용해 텍스트 프롬프트로 이미지를 생성합니다. (FLUX.1-schnell via Together)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYes이미지 생성 프롬프트
num_inference_stepsNo추론 스텝 수 (기본값: 4, 최대: 10)

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
contentYes

Implementation Reference

  • The handler function for 'generate-image', which uses the HuggingFace InferenceClient to generate an image from a text prompt.
    async ({ prompt, num_inference_steps }) => {
        const token = process.env.HF_TOKEN
        if (!token) {
            return {
                content: [
                    {
                        type: 'text' as const,
                        text: 'HF_TOKEN 환경변수가 설정되지 않았습니다. Hugging Face 토큰을 설정해주세요.'
                    }
                ],
                isError: true,
                structuredContent: {
                    content: [
                        {
                            type: 'text' as const,
                            text: 'HF_TOKEN 환경변수가 설정되지 않았습니다. Hugging Face 토큰을 설정해주세요.'
                        }
                    ]
                }
            }
        }
    
        try {
            const client = new InferenceClient(token)
            const blob = (await client.textToImage({
                provider: HF_PROVIDER as 'together',
                model: HF_MODEL,
                inputs: prompt,
                parameters: { num_inference_steps }
            })) as Blob | string
            const base64 = await blobToBase64(blob)
    
            return {
                content: [
                    {
                        type: 'image' as const,
                        data: base64,
                        mimeType: 'image/png'
                    }
                ],
                structuredContent: {
                    content: [
                        {
                            type: 'image' as const,
                            data: base64,
                            mimeType: 'image/png'
                        }
                    ]
                }
            }
        } catch (err) {
            const message = err instanceof Error ? err.message : String(err)
            return {
                content: [
                    {
                        type: 'text' as const,
                        text: `이미지 생성에 실패했습니다: ${message}`
                    }
                ],
                isError: true,
                structuredContent: {
                    content: [
                        {
                            type: 'text' as const,
                            text: `이미지 생성에 실패했습니다: ${message}`
                        }
                    ]
                }
            }
        }
    }
  • Input and output schema definition for the 'generate-image' tool using Zod.
    {
        description:
            'HuggingFace Inference API를 사용해 텍스트 프롬프트로 이미지를 생성합니다. (FLUX.1-schnell via Together)',
        inputSchema: z.object({
            prompt: z.string().describe('이미지 생성 프롬프트'),
            num_inference_steps: z
                .number()
                .int()
                .min(1)
                .max(10)
                .optional()
                .default(4)
                .describe('추론 스텝 수 (기본값: 4, 최대: 10)')
        }),
        outputSchema: z.object({
            content: z.array(
                z.union([
                    z.object({
                        type: z.literal('image'),
                        data: z.string(),
                        mimeType: z.string()
                    }),
                    z.object({
                        type: z.literal('text'),
                        text: z.string()
                    })
                ])
            )
        })
    },
  • src/index.ts:377-480 (registration)
    Tool registration for 'generate-image' using server.registerTool.
    server.registerTool(
        'generate-image',
        {
            description:
                'HuggingFace Inference API를 사용해 텍스트 프롬프트로 이미지를 생성합니다. (FLUX.1-schnell via Together)',
            inputSchema: z.object({
                prompt: z.string().describe('이미지 생성 프롬프트'),
                num_inference_steps: z
                    .number()
                    .int()
                    .min(1)
                    .max(10)
                    .optional()
                    .default(4)
                    .describe('추론 스텝 수 (기본값: 4, 최대: 10)')
            }),
            outputSchema: z.object({
                content: z.array(
                    z.union([
                        z.object({
                            type: z.literal('image'),
                            data: z.string(),
                            mimeType: z.string()
                        }),
                        z.object({
                            type: z.literal('text'),
                            text: z.string()
                        })
                    ])
                )
            })
        },
        async ({ prompt, num_inference_steps }) => {
            const token = process.env.HF_TOKEN
            if (!token) {
                return {
                    content: [
                        {
                            type: 'text' as const,
                            text: 'HF_TOKEN 환경변수가 설정되지 않았습니다. Hugging Face 토큰을 설정해주세요.'
                        }
                    ],
                    isError: true,
                    structuredContent: {
                        content: [
                            {
                                type: 'text' as const,
                                text: 'HF_TOKEN 환경변수가 설정되지 않았습니다. Hugging Face 토큰을 설정해주세요.'
                            }
                        ]
                    }
                }
            }
    
            try {
                const client = new InferenceClient(token)
                const blob = (await client.textToImage({
                    provider: HF_PROVIDER as 'together',
                    model: HF_MODEL,
                    inputs: prompt,
                    parameters: { num_inference_steps }
                })) as Blob | string
                const base64 = await blobToBase64(blob)
    
                return {
                    content: [
                        {
                            type: 'image' as const,
                            data: base64,
                            mimeType: 'image/png'
                        }
                    ],
                    structuredContent: {
                        content: [
                            {
                                type: 'image' as const,
                                data: base64,
                                mimeType: 'image/png'
                            }
                        ]
                    }
                }
            } catch (err) {
                const message = err instanceof Error ? err.message : String(err)
                return {
                    content: [
                        {
                            type: 'text' as const,
                            text: `이미지 생성에 실패했습니다: ${message}`
                        }
                    ],
                    isError: true,
                    structuredContent: {
                        content: [
                            {
                                type: 'text' as const,
                                text: `이미지 생성에 실패했습니다: ${message}`
                            }
                        ]
                    }
                }
            }
        }
    )
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full disclosure burden. It identifies the specific model (FLUX.1-schnell) and provider (Together), which is valuable context, but omits operational characteristics such as typical latency, rate limits, cost implications, or whether results are persisted vs. ephemeral.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single efficient sentence with parenthetical model specification. Information is front-loaded with the core action, and every element (API name, model name, provider) earns its place without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (handling return value documentation) and complete parameter descriptions, the description provides sufficient essential context by identifying the AI model and backend service. However, it could benefit from noting this is an external API call with potential latency implications.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, establishing a baseline of 3. The description mentions 'text prompts' generally but does not elaborate on parameter semantics beyond the schema (e.g., explaining how num_inference_steps affects quality for FLUX specifically or prompt engineering best practices).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool generates images from text prompts using the HuggingFace Inference API, specifying both the verb (generate) and resource (images). It clearly distinguishes from siblings (calc, geocode, get-weather, etc.) which handle calculations and data retrieval rather than media generation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use exclusions or alternatives are mentioned. However, the tool's purpose (AI image generation) is distinct enough from text-based/calculation siblings that implied usage is reasonably clear, though explicit guidance on when to prefer this over other image generation methods is absent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/devbrother2024/my-mcp-server-260402'

If you have feedback or need assistance with the MCP directory API, please join our Discord server