mcp_process_text

Detect and anonymize personally identifiable information (PII) in text using GPT-4o-based detection with format-preserving encryption for structured identifiers.

Instructions

MCP Tool: 텍스트 PII 처리 (탐지 + 익명화)

Args:
    text (str): 처리할 텍스트
    
Returns:
    Dict[str, Any]: 처리 결과

Input Schema

TableJSON Schema

Name	Required	Description	Default
`text`	Yes

Implementation Reference

mcp_pii_tools.py:542-555 (handler)

The primary handler function for the MCP tool 'mcp_process_text'. Decorated with @mcp.tool() for automatic registration with FastMCP server. Instantiates MCPPIIProcessor and delegates to its process_text method to detect PII and anonymize the input text.

@mcp.tool()
def mcp_process_text(text: str) -> Dict[str, Any]:
    """
    MCP Tool: 텍스트 PII 처리 (탐지 + 익명화)
    
    Args:
        text (str): 처리할 텍스트
        
    Returns:
        Dict[str, Any]: 처리 결과
    """
    processor = MCPPIIProcessor()
    return processor.process_text(text)

mcp_pii_tools.py:855-868 (schema)

JSON schema definition for the 'mcp_process_text' tool, specifying input parameters (text: string) as used in MCP tool metadata.

"process_text": {
    "name": "process_text", 
    "description": "텍스트에서 PII(개인 정보) 를 탐지하고 익명화 처리합니다.",
    "parameters": {
        "type": "object",
        "properties": {
            "text": {
                "type": "string",
                "description": "처리할 텍스트"
            }
        },
        "required": ["text"]
    }
},

mcp_pii_tools.py:420-479 (helper)

Core helper method in MCPPIIProcessor class that implements the PII processing logic: detects PII using MCPPIIDetector.detect_pii(), converts to PIIItem objects, anonymizes the text, and returns structured results.

def process_text(self, text: str) -> Dict[str, Any]:
    """
    텍스트에서 PII를 탐지하고 처리 (MCP Tool용)
    
    Args:
        text (str): 처리할 텍스트
        
    Returns:
        Dict[str, Any]: MCP Tool 응답 형식
    """
    try:
        start_time = time.time()
        
        if not text:
            return {
                "success": True,
                "original_text": "",
                "anonymized_text": "",
                "pii_items": [],
                "count": 0,
                "processing_time": 0,
                "summary": {}
            }
        
        # 1. PII 탐지
        detection_result = self.detector.detect_pii(text)
        
        if not detection_result["success"]:
            return detection_result
        
        # PIIItem 객체로 변환
        pii_items = [PIIItem(**item) for item in detection_result["pii_items"]]
        
        # 2. 익명화 처리
        anonymized_text = self.detector.anonymize_text(text, pii_items)
        
        processing_time = time.time() - start_time
        
        return {
            "success": True,
            "original_text": text,
            "anonymized_text": anonymized_text,
            "pii_items": [asdict(item) for item in pii_items],
            "count": len(pii_items),
            "processing_time": processing_time,
            "summary": detection_result["summary"]
        }
        
    except Exception as e:
        return {
            "success": False,
            "error": str(e),
            "original_text": text,
            "anonymized_text": text,
            "pii_items": [],
            "count": 0,
            "processing_time": 0,
            "summary": {}
        }

mcp_pii_tools.py:154-256 (helper)

Key helper method in MCPPIIDetector class for PII detection using langextract library. Extracts PII entities (names, emails, phones, etc.), maps types to Korean labels, computes positions, and returns structured results. Called by the processor.

def detect_pii(self, text: str) -> Dict[str, Any]:
    """
    텍스트에서 PII를 탐지 (MCP Tool용)
    
    Args:
        text (str): 분석할 텍스트
        
    Returns:
        Dict[str, Any]: MCP Tool 응답 형식
    """
    try:
        start_time = time.time()
        
        # Provider에 따른 langextract 호출
        if self.provider_type == "vllm":
            # vLLM Provider 사용
            result = lx.extract(
                text_or_documents=text,
                prompt_description=self.prompt,
                examples=self.examples,
                model=self.provider,  # 커스텀 Provider 인스턴스 사용
                use_schema_constraints=False
            )
        else:
            # OpenAI Provider 사용 (기본)
            os.environ["OPENAI_BASE_URL"] = "https://api.openai.com/v1"
            result = lx.extract(
                text_or_documents=text,
                prompt_description=self.prompt,
                examples=self.examples,
                model_id=self.model_id,
                api_key=self.api_key,
                fence_output=True
            )
        
        # 결과를 PIIItem 리스트로 변환
        pii_items = []
        logger.info(f"탐지된 extraction 수: {len(result.extractions)}")
        
        for i, extraction in enumerate(result.extractions):
            logger.info(f"Extraction {i+1}: class='{extraction.extraction_class}', text='{extraction.extraction_text}'")
            
            # char_interval이 없으면 텍스트에서 직접 위치 찾기
            start_pos = 0
            end_pos = 0
            
            if extraction.char_interval:
                start_pos = extraction.char_interval.start_pos
                end_pos = extraction.char_interval.end_pos
                logger.info(f"  char_interval 사용: {start_pos}-{end_pos}")
            else:
                # 텍스트에서 직접 위치 찾기 (대소문자 구분 없이)
                search_text = extraction.extraction_text
                start_pos = text.find(search_text)
                
                # 대소문자 구분 없이 찾기
                if start_pos == -1:
                    start_pos = text.lower().find(search_text.lower())
                
                if start_pos != -1:
                    end_pos = start_pos + len(search_text)
                    logger.info(f"  텍스트에서 직접 찾음: {start_pos}-{end_pos}")
                    
                    # 실제 찾은 텍스트와 원본이 일치하는지 확인
                    actual_found = text[start_pos:end_pos]
                    if actual_found != search_text:
                        logger.warning(f"  대소문자 차이: 찾은='{actual_found}', 원본='{search_text}'")
                else:
                    start_pos = -1
                    end_pos = -1
                    logger.warning(f"  텍스트에서 찾을 수 없음: '{extraction.extraction_text}'")
            
            mapped_type = self._map_extraction_class(extraction.extraction_class)
            logger.info(f"  매핑된 타입: '{extraction.extraction_class}' -> '{mapped_type}'")
            
            pii_items.append(PIIItem(
                type=mapped_type,
                value=extraction.extraction_text,
                confidence=0.9,  # langextract는 confidence를 제공하지 않으므로 기본값
                start_pos=start_pos,
                end_pos=end_pos
            ))
        
        processing_time = time.time() - start_time
        
        return {
            "success": True,
            "pii_items": [asdict(item) for item in pii_items],
            "count": len(pii_items),
            "processing_time": processing_time,
            "summary": self._get_pii_summary(pii_items)
        }
        
    except Exception as e:
        return {
            "success": False,
            "error": str(e),
            "pii_items": [],
            "count": 0,
            "processing_time": 0,
            "summary": {}
        }

MCP PII Tools

mcp_process_text

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API