lint_marks
Lint source mark texts before submission to detect empty strings, control characters, length issues, missing punctuation, and problematic tokens. Returns a verdict: ok, watch, or blocked.
Instructions
Pre-submission linter for source mark texts. ZERO API calls — runs locally in milliseconds. Catches issues that historically tripped the pipeline before you waste a Gemini call on them.
Checks per mark:
Empty / whitespace-only (validate-marks would reject anyway)
Control characters (need stripping)
Length above thresholds (cwseg fragmentation, Gemini context risk)
Missing sentence-final punctuation (alignment edge case)
Historically-tricky tokens for the source language (e.g. ZH 那 demonstrative-vs-relative; KO 일 day-vs-event)
CJK + embedded Latin words that confuse cwseg
Returns: {summary: {markCount, warningCount, errorCount, verdict}, marks: [...]} verdict: "ok" | "watch" (warnings only) | "blocked" (errors present)
Run this BEFORE create_chapter_from_marks. If verdict is "blocked", fix the source text. If "watch", spot-check the flagged marks after upload.
Args: source_language: Language of the marks (EN/FR/ES/DE/IT/PT/ZH/JA/KO). marks: List of source-language sentences. Same input you'd pass to create_chapter_from_marks.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| source_language | Yes | ||
| marks | Yes |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |