Skip to main content
Glama
Mistizz

Japanese Text Analyzer

analyze_file

Analyze Japanese text files for detailed morphological and linguistic features, including sentence complexity, part-of-speech distribution, and vocabulary diversity.

Instructions

ファイルの詳細な形態素解析と言語的特徴の分析を行います。文の複雑さ、品詞の割合、語彙の多様性などを解析します。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filePathYes分析するファイルのパス(Windows形式かWSL/Linux形式の絶対パスを推奨)

Implementation Reference

  • src/index.ts:550-572 (registration)
    Registration of the analyze_file tool with MCP server, including name, description, input schema, and handler function.
    this.server.tool(
      'analyze_file', 
      'ファイルの詳細な形態素解析と言語的特徴の分析を行います。文の複雑さ、品詞の割合、語彙の多様性などを解析します。',
      { 
        filePath: z.string().describe('分析するファイルのパス(Windows形式かWSL/Linux形式の絶対パスを推奨)')
      },
      async ({ filePath }) => {
        try {
          // ファイルパスを解決
          const resolvedPath = resolveFilePath(filePath);
          const fileContent = fs.readFileSync(resolvedPath, 'utf8');
          return await this.analyzeTextImpl(fileContent);
        } catch (error: any) {
          return {
            content: [{ 
              type: 'text' as const, 
              text: `ファイル読み込みエラー: ${error.message}`
            }],
            isError: true
          };
        }
      }
    );
  • Core handler logic for analyze_file tool: initializes tokenizer, performs morphological analysis with kuromoji, computes POS ratios, script types, vocabulary diversity, honorifics, and formats detailed analysis report.
      private async analyzeTextImpl(text: string) {
        try {
          // 形態素解析器の初期化チェック
          let tokenizer;
          try {
            tokenizer = await initializeTokenizer();
          } catch (error) {
            return {
              content: [{ 
                type: 'text' as const, 
                text: '形態素解析器の初期化に失敗しました。しばらく待ってから再試行してください。'
              }],
              isError: true
            };
          }
    
          // テキストを文に分割(。で区切る)
          const sentences = text.split(/[。.!?!?]/g).filter(s => s.trim().length > 0);
          
          // 形態素解析を実行
          const tokens = tokenizer.tokenize(text);
    
          // 基本的な分析結果
          const totalChars = text.replace(/[\s\n\r]/g, '').length;
          const totalSentences = sentences.length;
          const totalMorphemes = tokens.length;
    
          // 品詞別のカウント
          const posCounts: Record<string, number> = {};
          const particleCounts: Record<string, number> = {};
          let totalParticles = 0;
          
          // 文字種別のカウント
          const scriptCounts = {
            hiragana: 0,
            katakana: 0,
            kanji: 0,
            alphabet: 0,
            digit: 0,
            other: 0
          };
          
          // 単語の一意性確認用
          const uniqueWords = new Set<string>();
          let katakanaWords = 0;
          let punctuationCount = 0;
          const honorificExpressions = ['です', 'ます', 'でした', 'ました', 'ございます', 'いただく', 'なさる', 'れる', 'られる', 'どうぞ', 'お', 'ご'];
          let honorificCount = 0;
    
          // 各トークンを処理
          tokens.forEach((token: any) => {
            // 品詞カウント
            posCounts[token.pos] = (posCounts[token.pos] || 0) + 1;
            
            // 助詞カウント
            if (token.pos === '助詞') {
              particleCounts[token.surface_form] = (particleCounts[token.surface_form] || 0) + 1;
              totalParticles++;
            }
            
            // 単語カウント
            uniqueWords.add(token.basic_form);
            
            // カタカナ語カウント
            if (/^[\u30A0-\u30FF]+$/.test(token.surface_form)) {
              katakanaWords++;
            }
    
            // 句読点カウント
            if (token.pos === '記号' && (token.pos_detail_1 === '句点' || token.pos_detail_1 === '読点')) {
              punctuationCount++;
            }
    
            // 敬語表現カウント
            if (honorificExpressions.some(expr => token.surface_form.includes(expr) || token.basic_form.includes(expr))) {
              honorificCount++;
            }
          });
    
          // 文字種のカウント
          for (const char of text) {
            if (/[\u3040-\u309F]/.test(char)) {
              scriptCounts.hiragana++;
            } else if (/[\u30A0-\u30FF]/.test(char)) {
              scriptCounts.katakana++;
            } else if (/[\u4E00-\u9FAF]/.test(char)) {
              scriptCounts.kanji++;
            } else if (/[a-zA-Z]/.test(char)) {
              scriptCounts.alphabet++;
            } else if (/[0-90-9]/.test(char)) {
              scriptCounts.digit++;
            } else if (!/\s/.test(char)) {
              scriptCounts.other++;
            }
          }
    
          // 各指標の計算
          const totalNonSpaceChars = Object.values(scriptCounts).reduce((a, b) => a + b, 0);
          
          // features.ymlに基づく解析結果
          const analysisResults = {
            average_sentence_length: {
              name: '平均文長',
              value: totalSentences > 0 ? (totalChars / totalSentences).toFixed(2) : '0.00',
              unit: '文字/文',
              description: '一文の長さ。長すぎると読みにくくなる。'
            },
            average_morphemes_per_sentence: {
              name: '文あたりの形態素数',
              value: totalSentences > 0 ? (totalMorphemes / totalSentences).toFixed(2) : '0.00',
              unit: '形態素/文',
              description: '文の密度や構文の複雑さを表す。'
            },
            pos_ratio: {
              name: '品詞の割合',
              value: Object.entries(posCounts).map(([pos, count]) => {
                return `${pos}: ${((count / totalMorphemes) * 100).toFixed(2)}%`;
              }).join(', '),
              unit: '%',
              description: '名詞・動詞・形容詞などの使用バランスを分析。'
            },
            particle_ratio: {
              name: '助詞の割合',
              value: Object.entries(particleCounts)
                .sort((a, b) => b[1] - a[1])
                .slice(0, 10)
                .map(([particle, count]) => {
                  return `${particle}: ${((count / totalParticles) * 100).toFixed(2)}%`;
                }).join(', '),
              unit: '%',
              description: '主語・目的語などの構造分析や文の流れを判断。'
            },
            script_type_ratio: {
              name: '文字種の割合',
              value: Object.entries(scriptCounts).map(([type, count]) => {
                return `${type}: ${((count / totalNonSpaceChars) * 100).toFixed(2)}%`;
              }).join(', '),
              unit: '%',
              description: 'ひらがな・カタカナ・漢字・英数字の構成比率。'
            },
            vocabulary_diversity: {
              name: '語彙の多様性(タイプ/トークン比)',
              value: ((uniqueWords.size / totalMorphemes) * 100).toFixed(2),
              unit: '%',
              description: '語彙の豊かさや表現力の指標。'
            },
            katakana_word_ratio: {
              name: 'カタカナ語の割合',
              value: ((katakanaWords / totalMorphemes) * 100).toFixed(2),
              unit: '%',
              description: '外来語や専門用語の多さ、カジュアルさを示す。'
            },
            honorific_frequency: {
              name: '敬語の頻度',
              value: totalSentences > 0 ? (honorificCount / totalSentences).toFixed(2) : '0.00',
              unit: '回/文',
              description: '丁寧・フォーマルさを示す。'
            },
            punctuation_per_sentence: {
              name: '句読点の平均数',
              value: totalSentences > 0 ? (punctuationCount / totalSentences).toFixed(2) : '0.00',
              unit: '個/文',
              description: '文の区切りや読みやすさに影響。'
            }
          };
    
          // 結果をテキスト形式で整形
          const resultText = `# テキスト分析結果
    
    ## 基本情報
    - 総文字数: ${totalChars}文字
    - 文の数: ${totalSentences}
    - 総形態素数: ${totalMorphemes}
    
    ## 詳細分析
    ${Object.entries(analysisResults).map(([key, data]) => {
      return `### ${data.name} (${data.unit})
    - 値: ${data.value}
    - 説明: ${data.description}`;
    }).join('\n\n')}
    `;
    
          return {
            content: [{ 
              type: 'text' as const, 
              text: resultText
            }]
          };
        } catch (error: any) {
          return {
            content: [{ 
              type: 'text' as const, 
              text: `分析中にエラーが発生しました: ${error.message}`
            }],
            isError: true
          };
        }
      }
  • Input schema using Zod: requires 'filePath' string parameter describing the file to analyze.
    { 
      filePath: z.string().describe('分析するファイルのパス(Windows形式かWSL/Linux形式の絶対パスを推奨)')
    },
  • Helper function to resolve file paths, supporting Windows absolute paths and WSL/Linux (/c/... ) formats, with existence checks.
    function resolveFilePath(filePath: string): string {
      try {
        // WSL/Linux形式のパス (/c/Users/...) をWindows形式 (C:\Users\...) に変換
        if (filePath.match(/^\/[a-zA-Z]\//)) {
          // /c/Users/... 形式を C:\Users\... 形式に変換
          const drive = filePath.charAt(1).toUpperCase();
          let windowsPath = `${drive}:${filePath.substring(2).replace(/\//g, '\\')}`;
          
          console.error(`WSL/Linux形式のパスをWindows形式に変換: ${filePath} -> ${windowsPath}`);
          
          if (fs.existsSync(windowsPath)) {
            console.error(`変換されたパスでファイルを発見: ${windowsPath}`);
            return windowsPath;
          }
        }
        
        // 通常の絶対パスの処理
        if (path.isAbsolute(filePath)) {
          if (fs.existsSync(filePath)) {
            console.error(`絶対パスでファイルを発見: ${filePath}`);
            return filePath;
          }
          
          // 絶対パスでファイルが見つからない場合はエラー
          throw new Error(`指定された絶対パス "${filePath}" が存在しません。パスが正しいか確認してください。` +
                          ` Windows形式(C:\\Users\\...)かWSL/Linux形式(/c/Users/...)で指定してください。`);
        }
        
        // 相対パスの場合、カレントワーキングディレクトリから検索
        const cwdPath = path.resolve(process.cwd(), filePath);
        if (fs.existsSync(cwdPath)) {
          console.error(`カレントディレクトリでファイルを発見: ${cwdPath}`);
          return cwdPath;
        }
        
        // どこにも見つからなかった場合
        throw new Error(`ファイル "${filePath}" が見つかりませんでした。絶対パスで指定してください。` +
                        ` Windows形式(C:\\Users\\...)かWSL/Linux形式(/c/Users/...)で指定可能です。`);
      } catch (error) {
        throw error;
      }
    }
  • Helper to asynchronously initialize the kuromoji tokenizer instance, locating dictionary paths automatically.
    async function initializeTokenizer() {
      // すでに初期化されている場合
      if (tokenizerInstance) {
        return tokenizerInstance;
      }
      
      // 初期化中の場合は既存のPromiseを返す
      if (initializingPromise) {
        return initializingPromise;
      }
      
      console.error('形態素解析器の初期化を開始...');
      
      // 辞書パスを取得
      const dicPath = findDictionaryPath();
      console.error(`使用する辞書パス: ${dicPath}`);
      
      // 初期化処理をPromiseでラップ
      initializingPromise = new Promise((resolve, reject) => {
        try {
          kuromoji.builder({ dicPath }).build((err, tokenizer) => {
            if (err) {
              console.error(`形態素解析器の初期化エラー: ${err.message || err}`);
              initializationError = err;
              initializingPromise = null; // リセットして再試行できるようにする
              tokenizerReady = false;
              reject(err);
              return;
            }
            
            console.error('形態素解析器の初期化が完了しました');
            tokenizerInstance = tokenizer;
            tokenizerReady = true;
            resolve(tokenizer);
          });
        } catch (error) {
          console.error(`形態素解析器の初期化中に例外が発生: ${error.message || error}`);
          initializationError = error;
          initializingPromise = null; // リセットして再試行できるようにする
          tokenizerReady = false;
          reject(error);
        }
      });
      
      return initializingPromise;
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions what the tool analyzes but doesn't describe how it behaves: whether it's read-only or modifies files, what permissions are needed, error handling, performance characteristics, or output format. For a file analysis tool with zero annotation coverage, this is a significant gap in behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise with two sentences that efficiently convey the tool's purpose and analysis scope. It's front-loaded with the core function and avoids unnecessary elaboration, though it could potentially benefit from slightly more structure to separate different aspects of analysis.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of file analysis (involving file I/O, linguistic processing) with no annotations and no output schema, the description is incomplete. It doesn't explain what the analysis returns, error conditions, file format requirements, or behavioral constraints. For a tool that presumably performs non-trivial linguistic analysis on files, more context is needed for effective agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with the single parameter 'filePath' well-documented in the schema. The description doesn't add any parameter-specific information beyond what's already in the schema, so it meets the baseline of 3 where the schema does the heavy lifting without compensating with additional semantic context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs 'detailed morphological analysis and linguistic feature analysis' of files, specifying it analyzes sentence complexity, part-of-speech ratios, and lexical diversity. This provides a specific verb ('analyze') and resource ('files'), though it doesn't explicitly differentiate from sibling tools like 'analyze_text' which might analyze text directly rather than files.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'analyze_text' or the counting tools. It doesn't mention prerequisites, file format requirements, or any context for choosing this tool over siblings, leaving the agent to infer usage based on the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Mistizz/mcp-JapaneseTextAnalyzer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server