Skip to main content
Glama
czangyeob

MCP PII Tools

by czangyeob

mcp_anonymize_text

Anonymize personally identifiable information (PII) in text by replacing sensitive data with secure placeholders while maintaining text structure.

Instructions

MCP Tool: 텍스트 익명화

Args:
    text (str): 원본 텍스트
    pii_items (List[Dict[str, Any]]): PII 항목들
    
Returns:
    str: 익명화된 텍스트

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
textYes
pii_itemsYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • MCP tool handler function decorated with @mcp.tool(). Converts input pii_items to PIIItem objects and delegates to the detector's anonymize_text method for execution.
    @mcp.tool()
    def mcp_anonymize_text(text: str, pii_items: List[Dict[str, Any]]) -> str:
        """
        MCP Tool: 텍스트 익명화
        
        Args:
            text (str): 원본 텍스트
            pii_items (List[Dict[str, Any]]): PII 항목들
            
        Returns:
            str: 익명화된 텍스트
        """
        detector = get_detector()
        pii_objects = [PIIItem(**item) for item in pii_items]
        return detector.anonymize_text(text, pii_objects)
  • Core implementation of text anonymization logic within MCPPIIDetector class. Validates PII items, sorts by position, handles special cases for addresses, replaces PII values with placeholders like [이름], [전화번호], etc., using robust matching strategies.
    def anonymize_text(self, text: str, pii_items: List[PIIItem]) -> str:
        """텍스트에서 PII를 익명화 처리"""
        if not pii_items:
            return text
        
        logger.info(f"익명화 시작: 원본 텍스트 길이={len(text)}")
        logger.info(f"익명화할 PII 항목 수: {len(pii_items)}")
        
        # PII 항목들을 위치별로 정렬하고 중복 제거
        valid_items = []
        for item in pii_items:
            if item.start_pos != -1 and item.end_pos != -1 and item.start_pos < len(text):
                # 실제 텍스트에서 해당 위치의 내용이 일치하는지 확인
                actual_text = text[item.start_pos:item.end_pos]
                if actual_text == item.value:
                    valid_items.append(item)
                    logger.info(f"유효한 PII: '{item.value}' ({item.type}) at {item.start_pos}-{item.end_pos}")
                else:
                    # 주소의 경우 위치 불일치를 무시하고 유효한 것으로 처리
                    if item.type == "주소":
                        valid_items.append(item)
                        logger.info(f"주소 위치 불일치 무시: '{item.value}' ({item.type}) at {item.start_pos}-{item.end_pos}")
                    else:
                        logger.warning(f"위치 불일치: 예상='{item.value}', 실제='{actual_text}' at {item.start_pos}-{item.end_pos}")
            else:
                # 주소의 경우 위치가 유효하지 않아도 처리
                if item.type == "주소":
                    valid_items.append(item)
                    logger.info(f"주소 위치 무효 무시: '{item.value}' ({item.type}) at {item.start_pos}-{item.end_pos}")
                else:
                    logger.warning(f"유효하지 않은 위치: '{item.value}' ({item.type}) at {item.start_pos}-{item.end_pos}")
        
        # 뒤에서부터 치환 (인덱스 변화 방지)
        sorted_items = sorted(valid_items, key=lambda x: x.start_pos, reverse=True)
        anonymized_text = text
        
        # 주소의 경우 특별 처리: langextract 위치 정보 무시하고 직접 찾기
        address_items = [item for item in sorted_items if item.type == "주소"]
        other_items = [item for item in sorted_items if item.type != "주소"]
        
        # 주소 먼저 처리
        for item in address_items:
            logger.info(f"주소 특별 처리: '{item.value}' ({item.type})")
            
            # PII 유형에 따른 익명화
            anonymized_value = "[주소]"
            
            # 직접 텍스트에서 찾기
            if item.value in anonymized_text:
                anonymized_text = anonymized_text.replace(item.value, anonymized_value, 1)
                logger.info(f"주소 직접 매치 익명화 완료: '{item.value}' -> '{anonymized_value}'")
            else:
                # 부분 매치 시도
                keywords = [word for word in item.value.split() if len(word) > 1]
                for keyword in reversed(keywords):  # 뒤에서부터 시도
                    if keyword in anonymized_text:
                        anonymized_text = anonymized_text.replace(keyword, anonymized_value, 1)
                        logger.info(f"주소 키워드 매치 익명화 완료: '{keyword}' -> '{anonymized_value}' (원본: '{item.value}')")
                        break
                else:
                    logger.warning(f"주소 '{item.value}'를 찾을 수 없음")
        
        # 나머지 PII 처리
        for item in other_items:
            logger.info(f"익명화 처리 중: '{item.value}' ({item.type}) at {item.start_pos}-{item.end_pos}")
            
            # PII 유형에 따른 익명화
            if item.type == "이름":
                anonymized_value = "[이름]"
            elif item.type == "전화번호":
                anonymized_value = "[전화번호]"
            elif item.type == "이메일":
                anonymized_value = "[이메일]"
            elif item.type == "주소":
                anonymized_value = "[주소]"
            elif item.type == "여권번호":
                anonymized_value = "[여권번호]"
            else:
                anonymized_value = f"[{item.type}]"
            
            # 텍스트에서 치환 (강화된 문자열 치환 사용)
            try:
                # 현재 텍스트에서 해당 값이 있는지 확인
                if item.value in anonymized_text:
                    # 문자열 치환 수행
                    anonymized_text = anonymized_text.replace(item.value, anonymized_value, 1)  # 첫 번째 매치만 치환
                    logger.info(f"익명화 완료: '{item.value}' -> '{anonymized_value}'")
                else:
                    # 대소문자 무시하고 찾기
                    import re
                    pattern = re.escape(item.value)
                    match = re.search(pattern, anonymized_text, re.IGNORECASE)
                    if match:
                        start, end = match.span()
                        anonymized_text = anonymized_text[:start] + anonymized_value + anonymized_text[end:]
                        logger.info(f"대소문자 무시 익명화 완료: '{item.value}' -> '{anonymized_value}'")
                    else:
                        # 부분 매치 시도 (주소의 경우)
                        if item.type == "주소" and len(item.value) > 3:
                            # 주소의 마지막 부분으로 찾기
                            last_part = item.value.split()[-1] if ' ' in item.value else item.value[-3:]
                            if last_part in anonymized_text:
                                anonymized_text = anonymized_text.replace(last_part, anonymized_value, 1)
                                logger.info(f"부분 매치 익명화 완료: '{last_part}' -> '{anonymized_value}' (원본: '{item.value}')")
                            else:
                                # 더 강력한 주소 매칭: 정규식으로 찾기
                                import re
                                # 주소 패턴을 정규식으로 변환
                                address_pattern = re.escape(item.value).replace(r'\ ', r'\s+')
                                match = re.search(address_pattern, anonymized_text, re.IGNORECASE)
                                if match:
                                    start, end = match.span()
                                    anonymized_text = anonymized_text[:start] + anonymized_value + anonymized_text[end:]
                                    logger.info(f"정규식 매치 익명화 완료: '{item.value}' -> '{anonymized_value}'")
                                else:
                                    # 마지막 시도: 주소의 핵심 키워드로 찾기
                                    keywords = [word for word in item.value.split() if len(word) > 1]
                                    for keyword in reversed(keywords):  # 뒤에서부터 시도
                                        if keyword in anonymized_text:
                                            anonymized_text = anonymized_text.replace(keyword, anonymized_value, 1)
                                            logger.info(f"키워드 매치 익명화 완료: '{keyword}' -> '{anonymized_value}' (원본: '{item.value}')")
                                            break
                                    else:
                                        logger.warning(f"텍스트에서 '{item.value}'를 찾을 수 없음")
                        else:
                            logger.warning(f"텍스트에서 '{item.value}'를 찾을 수 없음")
            except Exception as e:
                logger.error(f"익명화 실패: {e}, item: {item}")
        
        logger.info(f"익명화 완료: 결과 텍스트 길이={len(anonymized_text)}")
        return anonymized_text
  • Tool schema and metadata definition in the MCP_TOOLS dictionary, specifying input parameters (text and pii_items array of objects), descriptions, and required fields for the anonymize_text tool.
    "anonymize_text": {
        "name": "anonymize_text",
        "description": "PII(개인 정보) 항목들을 사용하여 텍스트를 익명화합니다.",
        "parameters": {
            "type": "object",
            "properties": {
                "text": {
                    "type": "string",
                    "description": "원본 텍스트"
                },
                "pii_items": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "type": {"type": "string"},
                            "value": {"type": "string"},
                            "confidence": {"type": "number"},
                            "start_pos": {"type": "number"},
                            "end_pos": {"type": "number"}
                        }
                    },
                    "description": "PII 항목들"
                }
            },
            "required": ["text", "pii_items"]
        }
  • @mcp.tool() decorator registers the mcp_anonymize_text function as an MCP tool with FastMCP.
    @mcp.tool()
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states the tool returns anonymized text but doesn't disclose what anonymization means (masking, replacement, removal), whether it's reversible, what happens to the original text, or any performance/rate limit considerations. For a PII handling tool with zero annotation coverage, this is a significant behavioral gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise with a clear structure: tool name, args section, and returns section. There's no unnecessary verbosity. However, the bilingual presentation (Korean title with English description) creates minor cognitive overhead, and the description could be more front-loaded with purpose before parameter documentation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 2 parameters with 0% schema coverage, no annotations, but an output schema exists, the description is minimally adequate. The output schema means the description doesn't need to explain return values, but it should do more to explain parameter semantics and behavioral context for a PII handling tool. It meets the bare minimum but leaves important questions unanswered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It lists parameters and types but adds minimal semantic meaning. 'PII items' is documented as a list of dictionaries but the description doesn't explain what keys/values are expected, what PII types are supported, or how the tool uses these items to anonymize the text. The description doesn't adequately compensate for the schema coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool 'anonymizes text' which is a clear verb+resource combination, but it doesn't specify how this differs from sibling tools like mcp_process_text or mcp_batch_process. The Korean title '텍스트 익명화' translates to 'text anonymization' which restates the English name rather than adding clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided about when to use this tool versus alternatives. With sibling tools like mcp_detect_pii, mcp_encrypt_text_pii, and mcp_process_text available, the description doesn't explain whether this tool should be used before/after detection, or how it differs from encryption tools. The description only documents parameters without usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/czangyeob/mcp-pii-tools'

If you have feedback or need assistance with the MCP directory API, please join our Discord server