Skip to main content
Glama

lint_section_v1

Validate markdown citations in academic writing by checking compliance with citation rules, ensuring proper attribution and formatting for evidence-based content.

Instructions

验证章节引用合规

检查 Agent 写作的 markdown 是否符合引用规则。

Args: pack_id: 证据包 ID markdown: Agent 写作的 markdown 内容 require_citations_per_paragraph: 是否要求每段有引用,默认 False min_citations_per_paragraph: 每段最少引用数,默认 1

Returns: passed, issues[], stats

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pack_idYes
markdownYes
require_citations_per_paragraphNo
min_citations_per_paragraphNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The main execution logic for the 'lint_section_v1' tool. This function validates the citations in the provided markdown against the chunks in the specified evidence pack (pack_id). It checks for valid chunk existence, pack membership, paragraph citation density (optional), and single-source dominance.
    @mcp.tool()
    def lint_section_v1(
        pack_id: int,
        markdown: str,
        require_citations_per_paragraph: bool = False,
        min_citations_per_paragraph: int = 1,
    ) -> dict[str, Any]:
        """验证章节引用合规
    
        检查 Agent 写作的 markdown 是否符合引用规则。
    
        Args:
            pack_id: 证据包 ID
            markdown: Agent 写作的 markdown 内容
            require_citations_per_paragraph: 是否要求每段有引用,默认 False
            min_citations_per_paragraph: 每段最少引用数,默认 1
    
        Returns:
            passed, issues[], stats
        """
        try:
            # 获取 pack 中所有 chunk_ids
            pack_chunks = query_all(
                "SELECT chunk_id FROM evidence_pack_items WHERE pack_id = %s",
                (pack_id,),
            )
            if not pack_chunks:
                return {"error": f"Pack not found or empty: {pack_id}"}
    
            valid_chunk_ids = {row["chunk_id"] for row in pack_chunks}
    
            # 解析 markdown 中的引用
            # 格式: [[chunk:<chunk_id>]]
            citation_pattern = r"\[\[chunk:(\d+)\]\]"
            citations = re.findall(citation_pattern, markdown)
            cited_chunk_ids = [int(c) for c in citations]
    
            issues = []
            valid_citations = 0
            invalid_citations = 0
    
            # 检查每个引用
            for chunk_id in cited_chunk_ids:
                # 检查是否存在
                exists = query_one(
                    "SELECT chunk_id FROM chunks WHERE chunk_id = %s",
                    (chunk_id,),
                )
                if not exists:
                    issues.append({
                        "severity": "error",
                        "rule": "CHUNK_NOT_FOUND",
                        "chunk_id": chunk_id,
                        "message": f"Chunk {chunk_id} does not exist in database",
                        "suggestion": "Remove this citation or use a valid chunk_id from the evidence pack",
                    })
                    invalid_citations += 1
                    continue
    
                # 检查是否在 pack 内
                if chunk_id not in valid_chunk_ids:
                    issues.append({
                        "severity": "error",
                        "rule": "CHUNK_OUT_OF_PACK",
                        "chunk_id": chunk_id,
                        "message": f"Chunk {chunk_id} is not in evidence pack {pack_id}",
                        "suggestion": "Only cite chunks from the provided evidence pack",
                    })
                    invalid_citations += 1
                    continue
    
                valid_citations += 1
    
            # 检查段落引用密度(可选)
            if require_citations_per_paragraph:
                # 按段落分割
                paragraphs = [p.strip() for p in markdown.split("\n\n") if p.strip()]
                for i, para in enumerate(paragraphs):
                    # 跳过标题行
                    if para.startswith("#"):
                        continue
                    # 统计该段落的引用
                    para_citations = re.findall(citation_pattern, para)
                    if len(para_citations) < min_citations_per_paragraph:
                        issues.append({
                            "severity": "warning",
                            "rule": "LOW_PARAGRAPH_DENSITY",
                            "paragraph_index": i,
                            "message": f"Paragraph {i+1} has {len(para_citations)} citations (minimum: {min_citations_per_paragraph})",
                            "suggestion": f"Add at least {min_citations_per_paragraph - len(para_citations)} more citation(s) to this paragraph",
                        })
    
            # 检查单一来源主导
            if cited_chunk_ids:
                # 获取每个 chunk 的 doc_id
                chunk_docs = {}
                for chunk_id in set(cited_chunk_ids):
                    doc = query_one(
                        "SELECT doc_id FROM chunks WHERE chunk_id = %s",
                        (chunk_id,),
                    )
                    if doc:
                        chunk_docs[chunk_id] = doc["doc_id"]
    
                # 统计每个文档被引用次数
                doc_cite_counts: dict[str, int] = defaultdict(int)
                for chunk_id in cited_chunk_ids:
                    doc_id = chunk_docs.get(chunk_id)
                    if doc_id:
                        doc_cite_counts[doc_id] += 1
    
                total = len(cited_chunk_ids)
                for doc_id, count in doc_cite_counts.items():
                    if count / total > 0.5:
                        issues.append({
                            "severity": "warning",
                            "rule": "SINGLE_SOURCE_DOMINANT",
                            "doc_id": doc_id,
                            "message": f"Document {doc_id[:16]}... accounts for {count}/{total} ({count*100//total}%) of citations",
                            "suggestion": "Consider diversifying citations across multiple sources",
                        })
    
            # 判断是否通过
            has_errors = any(issue["severity"] == "error" for issue in issues)
    
            return {
                "passed": not has_errors,
                "issues": issues,
                "stats": {
                    "total_citations": len(cited_chunk_ids),
                    "valid_citations": valid_citations,
                    "invalid_citations": invalid_citations,
                    "unique_chunks_cited": len(set(cited_chunk_ids)),
                    "pack_chunk_count": len(valid_chunk_ids),
                    "citation_coverage": valid_citations / len(valid_chunk_ids) if valid_chunk_ids else 0,
                },
            }
    
        except Exception as e:
            return {"error": str(e), "passed": False}
  • The call to register_review_tools(mcp) registers all review tools, including 'lint_section_v1', to the FastMCP server instance.
    register_review_tools(mcp)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the tool '验证'/'检查' (verifies/checks) which implies a read-only analysis, but doesn't disclose behavioral traits like whether it modifies content, requires specific permissions, has rate limits, or what 'issues[]' and 'stats' contain. The description adds minimal context beyond basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded with the core purpose first, followed by parameter details and return values. Every sentence earns its place, though the structure could be slightly more polished (e.g., integrating 'Args' and 'Returns' more seamlessly).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, 0% schema coverage, but with an output schema (implied by 'Returns'), the description is moderately complete. It covers purpose and parameters well, but lacks behavioral context (e.g., error handling, performance). The output schema reduces need to explain return values, but more operational details would help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides clear semantics for all 4 parameters in the 'Args' section, explaining what each parameter represents (e.g., '证据包 ID', '是否要求每段有引用'), including defaults. This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('验证'/'检查') and resources ('章节引用合规'/'Agent 写作的 markdown'), and distinguishes it from siblings by focusing on citation compliance checking for markdown content. It's not a tautology of the name 'lint_section_v1'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context (checking Agent-written markdown for citation rules) but doesn't explicitly state when to use this tool versus alternatives like 'lint_review_v1' or other writing/editing tools. No explicit exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server