Skip to main content
Glama
czangyeob

MCP PII Tools

by czangyeob

mcp_process_text

Detect and anonymize personally identifiable information (PII) in text using GPT-4o-based detection with format-preserving encryption for structured identifiers.

Instructions

MCP Tool: 텍스트 PII 처리 (탐지 + 익명화)

Args:
    text (str): 처리할 텍스트
    
Returns:
    Dict[str, Any]: 처리 결과

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
textYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The primary handler function for the MCP tool 'mcp_process_text'. Decorated with @mcp.tool() for automatic registration with FastMCP server. Instantiates MCPPIIProcessor and delegates to its process_text method to detect PII and anonymize the input text.
    @mcp.tool()
    def mcp_process_text(text: str) -> Dict[str, Any]:
        """
        MCP Tool: 텍스트 PII 처리 (탐지 + 익명화)
        
        Args:
            text (str): 처리할 텍스트
            
        Returns:
            Dict[str, Any]: 처리 결과
        """
        processor = MCPPIIProcessor()
        return processor.process_text(text)
  • JSON schema definition for the 'mcp_process_text' tool, specifying input parameters (text: string) as used in MCP tool metadata.
    "process_text": {
        "name": "process_text", 
        "description": "텍스트에서 PII(개인 정보) 를 탐지하고 익명화 처리합니다.",
        "parameters": {
            "type": "object",
            "properties": {
                "text": {
                    "type": "string",
                    "description": "처리할 텍스트"
                }
            },
            "required": ["text"]
        }
    },
  • Core helper method in MCPPIIProcessor class that implements the PII processing logic: detects PII using MCPPIIDetector.detect_pii(), converts to PIIItem objects, anonymizes the text, and returns structured results.
    def process_text(self, text: str) -> Dict[str, Any]:
        """
        텍스트에서 PII를 탐지하고 처리 (MCP Tool용)
        
        Args:
            text (str): 처리할 텍스트
            
        Returns:
            Dict[str, Any]: MCP Tool 응답 형식
        """
        try:
            start_time = time.time()
            
            if not text:
                return {
                    "success": True,
                    "original_text": "",
                    "anonymized_text": "",
                    "pii_items": [],
                    "count": 0,
                    "processing_time": 0,
                    "summary": {}
                }
            
            # 1. PII 탐지
            detection_result = self.detector.detect_pii(text)
            
            if not detection_result["success"]:
                return detection_result
            
            # PIIItem 객체로 변환
            pii_items = [PIIItem(**item) for item in detection_result["pii_items"]]
            
            # 2. 익명화 처리
            anonymized_text = self.detector.anonymize_text(text, pii_items)
            
            processing_time = time.time() - start_time
            
            return {
                "success": True,
                "original_text": text,
                "anonymized_text": anonymized_text,
                "pii_items": [asdict(item) for item in pii_items],
                "count": len(pii_items),
                "processing_time": processing_time,
                "summary": detection_result["summary"]
            }
            
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "original_text": text,
                "anonymized_text": text,
                "pii_items": [],
                "count": 0,
                "processing_time": 0,
                "summary": {}
            }
  • Key helper method in MCPPIIDetector class for PII detection using langextract library. Extracts PII entities (names, emails, phones, etc.), maps types to Korean labels, computes positions, and returns structured results. Called by the processor.
    def detect_pii(self, text: str) -> Dict[str, Any]:
        """
        텍스트에서 PII를 탐지 (MCP Tool용)
        
        Args:
            text (str): 분석할 텍스트
            
        Returns:
            Dict[str, Any]: MCP Tool 응답 형식
        """
        try:
            start_time = time.time()
            
            # Provider에 따른 langextract 호출
            if self.provider_type == "vllm":
                # vLLM Provider 사용
                result = lx.extract(
                    text_or_documents=text,
                    prompt_description=self.prompt,
                    examples=self.examples,
                    model=self.provider,  # 커스텀 Provider 인스턴스 사용
                    use_schema_constraints=False
                )
            else:
                # OpenAI Provider 사용 (기본)
                os.environ["OPENAI_BASE_URL"] = "https://api.openai.com/v1"
                result = lx.extract(
                    text_or_documents=text,
                    prompt_description=self.prompt,
                    examples=self.examples,
                    model_id=self.model_id,
                    api_key=self.api_key,
                    fence_output=True
                )
            
            # 결과를 PIIItem 리스트로 변환
            pii_items = []
            logger.info(f"탐지된 extraction 수: {len(result.extractions)}")
            
            for i, extraction in enumerate(result.extractions):
                logger.info(f"Extraction {i+1}: class='{extraction.extraction_class}', text='{extraction.extraction_text}'")
                
                # char_interval이 없으면 텍스트에서 직접 위치 찾기
                start_pos = 0
                end_pos = 0
                
                if extraction.char_interval:
                    start_pos = extraction.char_interval.start_pos
                    end_pos = extraction.char_interval.end_pos
                    logger.info(f"  char_interval 사용: {start_pos}-{end_pos}")
                else:
                    # 텍스트에서 직접 위치 찾기 (대소문자 구분 없이)
                    search_text = extraction.extraction_text
                    start_pos = text.find(search_text)
                    
                    # 대소문자 구분 없이 찾기
                    if start_pos == -1:
                        start_pos = text.lower().find(search_text.lower())
                    
                    if start_pos != -1:
                        end_pos = start_pos + len(search_text)
                        logger.info(f"  텍스트에서 직접 찾음: {start_pos}-{end_pos}")
                        
                        # 실제 찾은 텍스트와 원본이 일치하는지 확인
                        actual_found = text[start_pos:end_pos]
                        if actual_found != search_text:
                            logger.warning(f"  대소문자 차이: 찾은='{actual_found}', 원본='{search_text}'")
                    else:
                        start_pos = -1
                        end_pos = -1
                        logger.warning(f"  텍스트에서 찾을 수 없음: '{extraction.extraction_text}'")
                
                mapped_type = self._map_extraction_class(extraction.extraction_class)
                logger.info(f"  매핑된 타입: '{extraction.extraction_class}' -> '{mapped_type}'")
                
                pii_items.append(PIIItem(
                    type=mapped_type,
                    value=extraction.extraction_text,
                    confidence=0.9,  # langextract는 confidence를 제공하지 않으므로 기본값
                    start_pos=start_pos,
                    end_pos=end_pos
                ))
            
            processing_time = time.time() - start_time
            
            return {
                "success": True,
                "pii_items": [asdict(item) for item in pii_items],
                "count": len(pii_items),
                "processing_time": processing_time,
                "summary": self._get_pii_summary(pii_items)
            }
            
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "pii_items": [],
                "count": 0,
                "processing_time": 0,
                "summary": {}
            }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions '탐지 + 익명화' (detection + anonymization), it doesn't specify what types of PII are detected, how anonymization is performed (masking, replacement, etc.), whether the operation is reversible, what permissions are required, or any rate limits. For a PII processing tool with zero annotation coverage, this leaves significant behavioral questions unanswered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections for the tool name, arguments, and returns. Each sentence serves a purpose: stating the function, documenting the parameter, and indicating the return type. However, the mixed Korean/English formatting could be slightly cleaner, and the 'MCP Tool:' prefix is somewhat redundant given the context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that this is a PII processing tool with no annotations but with an output schema (indicated by 'Has output schema: true'), the description is minimally adequate. It covers the basic purpose and parameter, and the output schema will handle return value documentation. However, for a sensitive operation like PII processing, more context about what constitutes PII, how anonymization works, and security considerations would be valuable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explicitly documents the single parameter 'text' with its type and purpose ('처리할 텍스트' meaning 'text to process'). With 0% schema description coverage, this adds crucial semantic meaning beyond the bare schema. However, it doesn't provide additional context about text length limits, supported languages, or formatting requirements that would be helpful for proper usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '텍스트 PII 처리 (탐지 + 익명화)' which translates to 'Text PII processing (detection + anonymization)'. This specifies the verb (process), resource (text), and scope (PII detection and anonymization). However, it doesn't explicitly distinguish this from sibling tools like 'mcp_detect_pii' or 'mcp_anonymize_text', which appear to offer separate detection or anonymization functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With sibling tools like 'mcp_detect_pii' and 'mcp_anonymize_text' available, there's no indication whether this tool should be used for combined detection+anonymization workflows, or how it differs from using those tools separately. No context about prerequisites, limitations, or appropriate scenarios is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/czangyeob/mcp-pii-tools'

If you have feedback or need assistance with the MCP directory API, please join our Discord server