Skip to main content
Glama

anonymize

Redact personally identifiable information in Czech legal texts using a multi-step NLP pipeline. Outputs include text, HTML, or CoNLL-U with deterministic placeholders for reproducibility.

Instructions

Production-grade pseudonymizace českých právních textů (v0.6.0).

Pipeline (8 kroků):

1. **Regex pre-pass** (`regex_pre_pass=True`) — strukturovaná PII
   (telefon, IČO, RČ, č.j., sp. zn., e-mail, URL, PSČ, SPZ, IBAN, DIČ,
   OP, datovka) se anonymizuje **PŘED** MasKITem, aby nebyly fragmentovány.
   Telefon "777 123 456" se anonymizuje **celý** jako jeden blok TELEFON1.

2. **Strict wrapper pre-pass** (`strict=True`) — NameTag najde
   firmy/úřady/instituce, které MasKIT vynechává nebo fragmentuje,
   a anonymizuje je sentinely → FIRMA1, INSTITUCE1.

3. **MasKIT** — pseudonymizace zbývajících PII (jména, adresy, ...).

4. **Stop-list filter** (`stop_list_filter=True`) — MasKIT občas
   chybně nahrazuje běžná slova ("stát" → "UniAgentury", "sporu" →
   "Pardubic"). Wrapper detekuje a vrátí originál, přidá warning.

5. **Restore sentinely** → finální placeholdery (TELEFON1, FIRMA1, ...).

6. **Fragmentation warnings** — detekce známých MasKIT problémů.

7. **Type classification** — NameTag dohledá typ entity pro každou náhradu.

8. **Placeholder mode** (`placeholder_mode=True`) — místo MasKIT náhodných
   fake names (`Jan Novák`) použij deterministické `OSOBA1`, `OSOBA2`...,
   `MESTO1`, `ULICE1`, ... S dedupingem: Jiří × 15× v textu → OSOBA1 × 15×.
   **Reprodukovatelné** (stejný vstup → stejný výstup) a **auditovatelné**.

Args:
    text: Vstupní text (čeština).
    output: Formát výstupu — ``txt`` (default), ``html``, ``conllu``.
    keep_mapping: Když True, vrátí mapping. **POZOR**: pokud má text
        dál opustit důvěrné prostředí, mapping vypni!
    classify_types: NameTag dohledá typ entity. Default ``True``.
    strict: Wrapper pre-pass na firmy/úřady. Default ``True``.
    placeholder_mode: ⭐ **NEW v0.6.0** — deterministic placeholdery
        místo MasKIT fake names. Pro reprodukovatelnost a auditovatelnost.
    regex_pre_pass: Default ``True``. Strukturovaná PII regexem PŘED MasKITem.
    stop_list_filter: Default ``True``. Rollback MasKIT false positives.

Returns:
    ``anonymized`` (čistý text), ``raw`` (MasKIT raw), ``replacements``
    (list s ``original``, ``placeholder``, ``type``, ``source``),
    ``warnings``, ``sources`` ({maskit, wrapper-regex, wrapper-strict,
    wrapper-placeholder}).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
textYes
outputNotxt
keep_mappingNo
classify_typesNo
strictNo
placeholder_modeNo
regex_pre_passNo
stop_list_filterNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It details the 8-step pipeline, each parameter's effect (e.g., placeholder_mode for reproducibility, stop_list_filter for false positives), and return structure. This is highly transparent about the tool's behavior and side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with numbered steps and bullet points, but it is verbose. Every sentence provides value, and it is front-loaded with the tool's purpose. However, it could be slightly more concise without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, multi-step pipeline, return types), the description is remarkably complete. It explains pipeline stages, parameter interactions, output fields, and even version-specific features. The output schema is described in the Returns section, fulfilling completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates by explaining each parameter (text, output, keep_mapping, etc.) in plain language, including defaults and behavioral impact. This adds significant meaning beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a 'Production-grade pseudonymizace českých právních textů' (pseudonymization of Czech legal texts). It provides a specific verb (pseudonymize), resource (Czech legal texts), and detailed pipeline. This distinguishes it from sibling tools like analyze_morphology or translate_text.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the pipeline and parameter behaviors, implying usage for Czech legal text anonymization. It mentions caveats like turning off keep_mapping if text leaves confidential environment. However, it does not explicitly state when not to use this tool or provide alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Buggy1111/anonymize-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server