mcp_process_text
Detect and anonymize personally identifiable information (PII) in text using GPT-4o-based detection with format-preserving encryption for structured identifiers.
Instructions
MCP Tool: 텍스트 PII 처리 (탐지 + 익명화)
Args:
text (str): 처리할 텍스트
Returns:
Dict[str, Any]: 처리 결과
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes |
Implementation Reference
- mcp_pii_tools.py:542-555 (handler)The primary handler function for the MCP tool 'mcp_process_text'. Decorated with @mcp.tool() for automatic registration with FastMCP server. Instantiates MCPPIIProcessor and delegates to its process_text method to detect PII and anonymize the input text.@mcp.tool() def mcp_process_text(text: str) -> Dict[str, Any]: """ MCP Tool: 텍스트 PII 처리 (탐지 + 익명화) Args: text (str): 처리할 텍스트 Returns: Dict[str, Any]: 처리 결과 """ processor = MCPPIIProcessor() return processor.process_text(text)
- mcp_pii_tools.py:855-868 (schema)JSON schema definition for the 'mcp_process_text' tool, specifying input parameters (text: string) as used in MCP tool metadata."process_text": { "name": "process_text", "description": "텍스트에서 PII(개인 정보) 를 탐지하고 익명화 처리합니다.", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "처리할 텍스트" } }, "required": ["text"] } },
- mcp_pii_tools.py:420-479 (helper)Core helper method in MCPPIIProcessor class that implements the PII processing logic: detects PII using MCPPIIDetector.detect_pii(), converts to PIIItem objects, anonymizes the text, and returns structured results.def process_text(self, text: str) -> Dict[str, Any]: """ 텍스트에서 PII를 탐지하고 처리 (MCP Tool용) Args: text (str): 처리할 텍스트 Returns: Dict[str, Any]: MCP Tool 응답 형식 """ try: start_time = time.time() if not text: return { "success": True, "original_text": "", "anonymized_text": "", "pii_items": [], "count": 0, "processing_time": 0, "summary": {} } # 1. PII 탐지 detection_result = self.detector.detect_pii(text) if not detection_result["success"]: return detection_result # PIIItem 객체로 변환 pii_items = [PIIItem(**item) for item in detection_result["pii_items"]] # 2. 익명화 처리 anonymized_text = self.detector.anonymize_text(text, pii_items) processing_time = time.time() - start_time return { "success": True, "original_text": text, "anonymized_text": anonymized_text, "pii_items": [asdict(item) for item in pii_items], "count": len(pii_items), "processing_time": processing_time, "summary": detection_result["summary"] } except Exception as e: return { "success": False, "error": str(e), "original_text": text, "anonymized_text": text, "pii_items": [], "count": 0, "processing_time": 0, "summary": {} }
- mcp_pii_tools.py:154-256 (helper)Key helper method in MCPPIIDetector class for PII detection using langextract library. Extracts PII entities (names, emails, phones, etc.), maps types to Korean labels, computes positions, and returns structured results. Called by the processor.def detect_pii(self, text: str) -> Dict[str, Any]: """ 텍스트에서 PII를 탐지 (MCP Tool용) Args: text (str): 분석할 텍스트 Returns: Dict[str, Any]: MCP Tool 응답 형식 """ try: start_time = time.time() # Provider에 따른 langextract 호출 if self.provider_type == "vllm": # vLLM Provider 사용 result = lx.extract( text_or_documents=text, prompt_description=self.prompt, examples=self.examples, model=self.provider, # 커스텀 Provider 인스턴스 사용 use_schema_constraints=False ) else: # OpenAI Provider 사용 (기본) os.environ["OPENAI_BASE_URL"] = "https://api.openai.com/v1" result = lx.extract( text_or_documents=text, prompt_description=self.prompt, examples=self.examples, model_id=self.model_id, api_key=self.api_key, fence_output=True ) # 결과를 PIIItem 리스트로 변환 pii_items = [] logger.info(f"탐지된 extraction 수: {len(result.extractions)}") for i, extraction in enumerate(result.extractions): logger.info(f"Extraction {i+1}: class='{extraction.extraction_class}', text='{extraction.extraction_text}'") # char_interval이 없으면 텍스트에서 직접 위치 찾기 start_pos = 0 end_pos = 0 if extraction.char_interval: start_pos = extraction.char_interval.start_pos end_pos = extraction.char_interval.end_pos logger.info(f" char_interval 사용: {start_pos}-{end_pos}") else: # 텍스트에서 직접 위치 찾기 (대소문자 구분 없이) search_text = extraction.extraction_text start_pos = text.find(search_text) # 대소문자 구분 없이 찾기 if start_pos == -1: start_pos = text.lower().find(search_text.lower()) if start_pos != -1: end_pos = start_pos + len(search_text) logger.info(f" 텍스트에서 직접 찾음: {start_pos}-{end_pos}") # 실제 찾은 텍스트와 원본이 일치하는지 확인 actual_found = text[start_pos:end_pos] if actual_found != search_text: logger.warning(f" 대소문자 차이: 찾은='{actual_found}', 원본='{search_text}'") else: start_pos = -1 end_pos = -1 logger.warning(f" 텍스트에서 찾을 수 없음: '{extraction.extraction_text}'") mapped_type = self._map_extraction_class(extraction.extraction_class) logger.info(f" 매핑된 타입: '{extraction.extraction_class}' -> '{mapped_type}'") pii_items.append(PIIItem( type=mapped_type, value=extraction.extraction_text, confidence=0.9, # langextract는 confidence를 제공하지 않으므로 기본값 start_pos=start_pos, end_pos=end_pos )) processing_time = time.time() - start_time return { "success": True, "pii_items": [asdict(item) for item in pii_items], "count": len(pii_items), "processing_time": processing_time, "summary": self._get_pii_summary(pii_items) } except Exception as e: return { "success": False, "error": str(e), "pii_items": [], "count": 0, "processing_time": 0, "summary": {} }