Skip to main content
Glama
r3-yamauchi

Amazon Bedrock Knowledge Base MCP Server

by r3-yamauchi

start_ingestion_job

Initiates asynchronous ingestion of documents from a data source into an Amazon Bedrock Knowledge Base for RAG applications.

Instructions

データソースからKnowledge Baseへのデータ取り込みジョブを開始します。

このジョブは非同期で実行され、データソース内のドキュメントを Knowledge Baseに取り込みます。ジョブの進捗はget_ingestion_jobで確認できます。

Args: knowledge_base_id: Knowledge BaseのID data_source_id: データソースのID

Returns: IngestionJobResponseDict: 取り込みジョブの開始結果 - ingestion_job_id: 開始された取り込みジョブのID - status: ジョブのステータス(通常は "STARTING" または "IN_PROGRESS") - statistics: 統計情報(オプション、ジョブ開始時は通常None)

Raises: ValueError: knowledge_base_idまたはdata_source_idが空の場合

Note: 取り込みジョブは非同期で実行されるため、この関数は即座に返ります。 ジョブの完了を待つには、get_ingestion_jobを定期的に呼び出して ステータスを確認してください。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
knowledge_base_idYes
data_source_idYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
statusYes
statisticsYes
ingestion_job_idYes

Implementation Reference

  • Tool registration and handler for start_ingestion_job in main.py
    @mcp.tool()  # MCPツールとして公開
    @handle_errors  # エラーハンドリングデコレータを適用
    def start_ingestion_job(
        knowledge_base_id: str, data_source_id: str
    ) -> IngestionJobResponseDict:
        """
        データソースからKnowledge Baseへのデータ取り込みジョブを開始します。
        
        このジョブは非同期で実行され、データソース内のドキュメントを
        Knowledge Baseに取り込みます。ジョブの進捗は`get_ingestion_job`で確認できます。
    
        Args:
            knowledge_base_id: Knowledge BaseのID
            data_source_id: データソースのID
    
        Returns:
            IngestionJobResponseDict: 取り込みジョブの開始結果
                - ingestion_job_id: 開始された取り込みジョブのID
                - status: ジョブのステータス(通常は "STARTING" または "IN_PROGRESS")
                - statistics: 統計情報(オプション、ジョブ開始時は通常None)
        
        Raises:
            ValueError: knowledge_base_idまたはdata_source_idが空の場合
        
        Note:
            取り込みジョブは非同期で実行されるため、この関数は即座に返ります。
            ジョブの完了を待つには、`get_ingestion_job`を定期的に呼び出して
            ステータスを確認してください。
        """
        # 入力値のバリデーション(共通関数を使用)
        knowledge_base_id = validate_required_string(knowledge_base_id, "knowledge_base_id")
        data_source_id = validate_required_string(data_source_id, "data_source_id")
        
        # Bedrockクライアントを使用して取り込みジョブを開始
        result = bedrock_client.start_ingestion_job(
            knowledge_base_id, data_source_id
        )
        return result
  • Actual Bedrock API implementation of start_ingestion_job in BedrockKBClient class
    def start_ingestion_job(
        self, knowledge_base_id: str, data_source_id: str
    ) -> IngestionJobResponseDict:
        """
        データソースからKnowledge Baseへのデータ取り込みジョブを開始します。
        
        このジョブは非同期で実行され、データソース内のドキュメントを
        Knowledge Baseに取り込みます。取り込みには時間がかかる場合があります。
    
        Args:
            knowledge_base_id: Knowledge BaseのID
            data_source_id: データソースのID
    
        Returns:
            IngestionJobResponseDict: 取り込みジョブの開始結果
                - ingestion_job_id: 開始された取り込みジョブのID
                - status: ジョブのステータス(通常は "STARTING" または "IN_PROGRESS")
                - statistics: 統計情報(オプション、ジョブ開始時は通常None)
        
        Raises:
            ClientError: AWS API呼び出しが失敗した場合
        
        Note:
            取り込みジョブは非同期で実行されるため、この関数は即座に返ります。
            ジョブの進捗を確認するには、`get_ingestion_job`を使用してください。
        """
        try:
            # AWS Bedrock APIを呼び出して取り込みジョブを開始
            response = self.bedrock_agent.start_ingestion_job(
                knowledgeBaseId=knowledge_base_id, dataSourceId=data_source_id
            )
            
            # ジョブ開始成功をログに記録
            logger.info(
                f"Started ingestion job {response['ingestionJob']['ingestionJobId']} "
                f"for data source {data_source_id}"
            )
            
            # レスポンスを整形して返す
            return {
                "ingestion_job_id": response["ingestionJob"]["ingestionJobId"],
                "status": response["ingestionJob"]["status"],
            }
        except ClientError as e:
            logger.error(f"Error starting ingestion job: {e}")
            raise
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It effectively discloses key behavioral traits: the job is asynchronous, returns immediately, and progress must be monitored via get_ingestion_job. It also mentions error conditions (ValueError for empty IDs). However, it lacks details on permissions, rate limits, or side effects (e.g., whether it overwrites existing data), leaving some gaps in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It starts with a clear purpose statement, followed by usage notes, parameter explanations, return values, and error handling. Each sentence adds value (e.g., explaining asynchronicity, monitoring, and error cases) without redundancy. The use of sections (Args, Returns, Raises, Note) enhances readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (asynchronous job initiation) and the presence of an output schema (which covers return values like ingestion_job_id, status, statistics), the description is complete. It explains the tool's purpose, usage, parameters, and behavioral aspects (asynchronicity, monitoring). No annotations are provided, but the description compensates adequately, making it self-sufficient for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It adds clear semantics for both parameters: knowledge_base_id is the ID of the Knowledge Base, and data_source_id is the ID of the data source. This goes beyond the schema's basic type definitions. However, it doesn't specify format constraints (e.g., UUID) or provide examples, which could improve clarity further.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'データソースからKnowledge Baseへのデータ取り込みジョブを開始します' (starts a data ingestion job from a data source to a Knowledge Base). It specifies the exact action (start ingestion job) and resources involved (data source, Knowledge Base), distinguishing it from sibling tools like get_ingestion_job (monitors progress) or create_data_source (creates resources).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when and how to use this tool. It states that the job runs asynchronously, recommends using get_ingestion_job to check progress, and warns that this function returns immediately without waiting for completion. It also mentions prerequisites (knowledge_base_id and data_source_id must not be empty) and raises ValueError if they are, helping users avoid errors.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/r3-yamauchi/bedrock-kb-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server