Skip to main content
Glama

recognize_text

Extract text from images using OCR technology. Supports multiple languages and common image formats to convert visual content into editable text.

Instructions

识别图片中的文字内容(OCR)。支持中文、英文等多种语言。支持 PNG、JPG、JPEG、BMP、GIF、WebP 格式。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
image_pathYes图片文件的本地绝对路径,例如:/Users/xxx/Desktop/image.png
languagesNo识别语言代码数组,可选。默认 ["chi_sim", "eng"](简体中文+英文)。可用语言:chi_sim(简中)、chi_tra(繁中)、eng(英文)、jpn(日文)、kor(韩文)等

Implementation Reference

  • ocr.js:124-188 (handler)
    The core implementation of the 'recognize_text' tool, which wraps Tesseract.js to perform OCR on images.
    export async function recognizeText(imagePath, languages = ['chi_sim', 'eng']) {
      // 第一步:验证图片
      const validation = validateImage(imagePath);
      if (!validation.valid) {
        return { success: false, error: validation.error };
      }
    
      try {
        // 第二步:执行 OCR 识别
        // Tesseract.recognize() 参数说明:
        // - 参数1: 图片路径(也支持 URL、Buffer、Base64)
        // - 参数2: 语言代码,多语言用 '+' 连接
        // - 参数3: 配置选项对象
        const result = await Tesseract.recognize(
          imagePath,
          languages.join('+'),  // 将语言数组转换为 Tesseract 格式,如 'chi_sim+eng'
          {
            // logger 回调可用于监控识别进度,调试时可取消注释
            // logger: m => console.log(m)
            // 
            // 进度信息格式示例:
            // { status: 'loading tesseract core', progress: 0.5 }
            // { status: 'recognizing text', progress: 0.8 }
          }
        );
    
        // 第三步:处理识别结果
        // result.data 包含完整的识别信息:
        // - text: 识别出的全部文字
        // - confidence: 整体置信度
        // - words: 单词级别的详细信息数组
        // - lines: 行级别的详细信息数组
        const text = result.data.text.trim();  // 去除首尾空白
        const confidence = result.data.confidence;
    
        // 处理空结果的情况
        if (!text) {
          return { 
            success: true,  // 技术上识别成功,只是没有文字
            text: '', 
            confidence,
            message: '图片中未识别到文字内容'
          };
        }
    
        // 返回成功结果
        return { 
          success: true, 
          text, 
          confidence,
          message: `识别完成,置信度: ${confidence.toFixed(1)}%`
        };
    
      } catch (error) {
        // 捕获并返回识别过程中的错误
        // 常见错误:
        // - 网络问题导致语言包下载失败
        // - 图片文件损坏
        // - 内存不足(处理超大图片时)
        return { 
          success: false, 
          error: `OCR 识别失败: ${error.message}` 
        };
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions support for languages and formats but does not describe what the tool returns (e.g., text output format, error handling), performance characteristics (e.g., speed, accuracy), or operational constraints (e.g., file size limits, authentication needs). This leaves significant gaps for a tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded, with two sentences that efficiently cover the core functionality, language support, and format support without any wasted words. Each sentence adds value, making it well-structured and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of an OCR tool with no annotations and no output schema, the description is incomplete. It lacks details on return values (e.g., structured text, confidence scores), error cases, or behavioral traits like rate limits or dependencies. This makes it inadequate for guiding an agent in practical use beyond basic invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters. The description adds no additional meaning beyond what the schema provides, such as explaining parameter interactions or usage examples. Baseline 3 is appropriate when the schema handles all parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('识别图片中的文字内容') and resources ('图片'), and distinguishes it from its sibling 'list_ocr_languages' by focusing on text recognition rather than language listing. It specifies support for multiple languages and image formats, making the purpose explicit and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by mentioning supported languages and image formats, but does not explicitly state when to use this tool versus alternatives or provide any exclusions. It lacks guidance on prerequisites or specific contexts for optimal use, relying on implicit understanding from the purpose statement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wenxint/ocp-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server