Skip to main content
Glama

raw_ingest

Ingest raw source documents into a knowledge base by adding local files, downloading from URLs, or importing Confluence pages and Jira issues.

Instructions

Ingest raw source documents into the knowledge base. Select mode to control the ingestion method:

  • add: Add a local file or content string (immutable, SHA-256 verified). Supports directory imports. Single images (<10MB) returned inline — you MUST immediately call wiki_write to describe them.

  • fetch: Download a file from a URL into raw/ (arXiv abstract URLs auto-converted to PDF). Single images returned inline — you MUST immediately call wiki_write to describe them.

  • import_confluence: Recursively import Confluence pages with attachments and hierarchy. Supports both Cloud (*.atlassian.net/wiki/...) and Server / Data Center ({host}/spaces/...). Defaults to reading the CONFLUENCE_API_TOKEN env var; pass auth_env to point at any other variable. Token format accepted: email:api-token (Cloud Basic), Bearer <pat> (explicit), or a bare PAT (Bearer prefix added automatically).

  • import_jira: Import a Jira issue with comments, attachments, and linked issues. Supports both Cloud and Server / Data Center; auto-falls-back to REST API v2 on older Server / DC. Defaults to reading the JIRA_API_TOKEN env var; pass auth_env to point at any other variable. Token format accepted: email:api-token (Cloud Basic), Bearer <pat> (explicit), or a bare PAT (Bearer prefix added automatically).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modeYesIngestion mode: add (local file/content), fetch (URL download), import_confluence (Confluence pages), import_jira (Jira issues)
filenameNo[add] Filename in raw/ (e.g. 'paper.pdf'). For directory imports, becomes subdirectory prefix (e.g. 'my-docs').
contentNo[add] File content as string. Either content or source_path is required.
source_pathNo[add] Absolute path to local file or directory to copy into raw/. If directory, all files imported recursively. Either content or source_path is required.
source_urlNo[add/fetch] Original URL where the document was downloaded from
descriptionNo[add/fetch] Brief description of what this source contains
tagsNo[add/fetch] Tags for categorization
auto_versionNo[add] If true and file already exists, create a versioned copy (e.g. report_v2.xlsx) instead of failing. Default: false.
patternNo[add] File pattern filter for directory imports (e.g. '*.html', '*.{html,css}'). Ignored for single files.
urlNo[fetch] URL to download from. arXiv abs URLs auto-converted to PDF links. [import_confluence] Confluence page URL. [import_jira] Jira issue URL.
recursiveNo[import_confluence] Import child pages recursively (default: false)
depthNo[import_confluence] Max recursion depth (-1 = unlimited, default: 50 when recursive=true)
auth_envNo[import_confluence] Auth env var name (default: CONFLUENCE_API_TOKEN). [import_jira] Auth env var name (default: JIRA_API_TOKEN)
include_commentsNo[import_jira] Include issue comments (default: true)
include_attachmentsNo[import_jira] Download attachments (default: true)
include_linksNo[import_jira] Import linked issues (default: true)
link_depthNo[import_jira] Levels of linked issues to follow (default: 1)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively covers key behaviors: image return and required follow-up action, arXiv URL conversion, directory import behavior, token format acceptance, and fallback to REST API v2 for older Jira Server. While it could mention rate limits or error handling, the level of detail is commendable for a complex tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is structured with a leading sentence followed by a bulleted list for each mode, making it scannable. It front-loads the core purpose. However, it is lengthy (multiple paragraphs), which is justified by the tool's complexity. Every sentence contributes meaningful content, ensuring no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's high complexity (4 modes, 17 parameters, no output schema), the description provides thorough context: token handling for each service, image behavior, directory import patterns, and more. It lacks details about return values or error states, but the description is rich enough for an AI agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although the input schema already describes all 17 parameters (100% coverage), the description adds significant value by grouping parameters by mode, explaining token formats, and clarifying behaviors like arXiv URL auto-conversion and auto_version semantics. This contextual information goes beyond mere parameter listing, aiding correct parameter selection.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Ingest raw source documents into the knowledge base.' It enumerates four distinct modes with specific verbs ('add', 'fetch', 'import_confluence', 'import_jira'), making it easy for an AI agent to understand the scope. The purpose is well-differentiated from sibling tools like `raw_list`, `raw_read`, and `knowledge_ingest`, which handle different aspects of raw or knowledge management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use each mode, including detailed conditions for image handling ('you MUST immediately call wiki_write'), token format for Confluence/Jira, and auto-versioning. However, it does not explicitly state when NOT to use this tool (e.g., when `knowledge_ingest` would be more appropriate). This minor omission prevents a higher score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/xinhuagu/agent-wiki'

If you have feedback or need assistance with the MCP directory API, please join our Discord server